PDA

View Full Version : extract and compare data from text files through excel vba



seema
11-28-2008, 12:20 PM
Hi All,
I have just joined, and am struggling with the following:
Need to extract and compare data from two text files. Data in these files is separated with pipes. I only need to extract relevant data and while comparing; I need to take out the difference between two data points. The problem that I am facing is that the pipes are not separated uniformly. And I am finding it difficult to extract the relevant data.

I need to do the following steps:

1. Extract relevant data from both text files.
2. Transfer the data to another output file in text format (since the record exceed 100K).
3. Take out difference of values between two data points
4. Prepare a summary.

Data in File 1.

14|FN|AAA|4|3.14|2||1||+1|1800|S|A|1-1974||NA|NA|BSSM|MANAGER

10|VIN|PPP|4|3.0|46||||162|22|AD|D|1-1970||NA|NA|AEBS|BRAN



Need ?14? ,?AAA? ,?4?,?3.14?, ?BSSM? data points from line 1 in file 1.

Need ?10?,?PPP?, ?4?, ?3.0? , ?AEBS? data points from line 2 in file 1.

Data in File 2
Would have similar records as file 1.

Extract : BSSM and AEBS data points are employee names , 3.14 and 3.0 are amounts. 14 and 10 are unique identifiers.

BSSM and AEBS is hypothec data and can vary, but the character length is fixed.
Unique identifier is hypothec data and can vary, by the length is fixed.

I need the difference of amounts in file 1 and 2 for respective employees against their unique identifiers in another text file.

This I want to do a subtotal in excel. The text files have more that 100,000 records in each.

Pls assist.

Thanks

lucas
12-01-2008, 09:33 PM
Moved to appropriate forum. This will increase your chance of getting help with your problem.

GTO
12-01-2008, 11:49 PM
...Data in these files is separated with pipes. I only need to extract relevant data and while comparing...

...The problem that I am facing is that the pipes are not separated uniformly....

Greetings seema,:)

Hey, I see that you just just joined, so let me say "Howdy" and welcome. You will meet some very nice folks here, who go out of their way to be helpful.

As to your question, maybe I could help a tiny bit by getting some clarity. What does "the pipes are not seperated uniformly" mean? In the examples you gave, it appears that the pipes could indeed be used to delimit to columns, and leastwise to my fading brain, this would seem to make data extraction reliable. By chance, are you just referring to how many characters/spaces may exist between given pipe sets?

Mark

seema
12-02-2008, 04:55 AM
By chance, are you just referring to how many characters/spaces may exist between given pipe sets?



Hi Mark!
Absolutely ! the data is separated with pipes and character lenght vary for each.

Thanks for looking into !

Regards
Seema

GTO
12-05-2008, 12:48 AM
Hi Seema,

Are you still looking for a solution to this?

Mark

seema
12-06-2008, 01:24 PM
Hi Seema,

Are you still looking for a solution to this?

Mark

Frantically Mark ... Frantically !:banghead: Pls help out!
Thanks

Kenneth Hobs
12-06-2008, 03:11 PM
The task would have been easier had the files been saved in database format or csv. As it is, we could do it using Split() to get the right records. However, it is unclear what the difference deal is about.

If you can make and attach two sample txt files with say 10 rows of data, and then manually insert the parts to make the third file that you want to create, it would be easier to help.

seema
12-07-2008, 12:59 AM
Hi Kenneth

I have attached three files.

File 1 & File 2 are original files from where I need to extract relavant data.
File 3 is the output which it needs to provide after comparing the above files.

Please note, there could be a difference in total number of records in both files hence while matching the data in file 1 & 2, the code would have to consider all lines.

Thanks Seema

Kenneth Hobs
12-07-2008, 10:37 AM
Is there a one to one relationship for the rows in the 2 input files?

I guess you understand that the output file size will be a bit larger than the file size of the sum of the input file sizes since you are repeating the information in each file.

seema
12-08-2008, 12:08 AM
Is there a one to one relationship for the rows in the 2 input files?

In the output file:-

F1 (CIF) – This has been extracted from first file
FI (RM) – This has been extracted from the first file
F2 (CIF) – This has been extracted from the second file
F2(RM) –This has been extracted from the second file

The above should tally, only then, the difference between F1 (AMT) and F2 (AMT) should be taken out. The F1 and F2 follow chronology but there could be a situation where new records are available in F2 which may not be there in the F1, such records should not give me a difference value but could give “NA”.


I guess you understand that the output file size will be a bit larger than the file size of the sum of the input file sizes since you are repeating the information in each file.
The total columns in the output file would be 11 against 18 in file 1 and 18 in file2. Guess the file size should not be double. Also, I need to import only the subtotals from the output file in the excel file. The entire output file doesn’t need to be extracted in excel. The remaining record wise data would be retained as an audit trail for the subtotal in the text format only.

GTO
12-08-2008, 11:22 PM
The total columns in the output file would be 11 against 18 in file 1 and 18 in file2. Guess the file size should not be double. Also, I need to import only the subtotals from the output file in the excel file. The entire output file doesn?t need to be extracted in excel. (underlining added for clarity) The remaining record wise data would be retained as an audit trail for the subtotal in the text format only.

Greetings Seema,

As both Kenneth and I are having a bit of a time picturing exactly how you want this to work, I thought that the best thing might be to supply a snippet, and see if it is "headed in the right direction".

Now speaking only for myself, I have a hard time understanding what you mean by "... The entire output file doesn't need to be extracted in Excel..."

Admittedly only by my limited knowledge, but I do not see a reliable way of working with the values in column five of the input text files, other than to first import them (along with the entire line) into Excel first. Now maybe that statement will trigger another member's input for a better way, but in the meantime, take the below and paste it into a standard module in a new/blank workbook.

Save the test workbook in a folder of your choice (I'd just create one for the test).

Create two test text files, ea having 10,000 records, name them: "Test1.txt" and "Test2.txt" - and save these to the same folder your test workbook is in.

Run the sub Main().

Now, as you'll note, I have this currently writing out the first 6500 records in the first range and the remainder in a range off to the right. This is of course to basically demonstrate the basic "getting the info" in 1/10th scale.

Now - if this gets you the basic info you're looking for, then we can see if you still want to move all the info to another text file...

Option Explicit
Sub Main()

With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False

Call ImportFileData("Test1.txt", "File One")
Call ImportFileData("Test2.txt", "File Two")

Call FindAndCombine

.Calculation = xlCalculationAutomatic
.ScreenUpdating = True
End With
End Sub

Private Sub FindAndCombine()
Dim wksFOne As Worksheet
Dim wksFTwo As Worksheet
Dim rngF1_R1 As Range
Dim rngF1_R2 As Range
Dim rngF2_R1 As Range
Dim rngF2_R2 As Range
Dim rngFOne As Range
Dim rngFTwo As Range
Dim rOne As Range
Dim rTwo As Range
Dim wksFileOutput As Worksheet
Dim sngOutputVal As Single
Dim lngCnt As Long
Set wksFileOutput = Worksheets.Add(Before:=Worksheets(1), Count:=1, Type:=xlWorksheet)
wksFileOutput.Name = "Output"
Set wksFOne = Worksheets("File One")
Set wksFTwo = Worksheets("File Two")

With wksFOne
Set rngF1_R1 = .Range(.Cells(1, 1), .Cells(.Cells(65536, 1).End(xlUp).Row, 1))
If Not .Cells(1, 7) = Empty Then
Set rngF1_R2 = .Range(.Cells(1, 7), .Cells(.Cells(65536, 7).End(xlUp).Row, 7))
Set rngFOne = Union(rngF1_R1, rngF1_R2)
Else
Set rngFOne = rngF1_R1
End If
End With

With wksFTwo
Set rngF2_R1 = .Range(.Cells(1, 1), .Cells(.Cells(65536, 1).End(xlUp).Row, 1))
If Not .Cells(1, 7) = Empty Then
Set rngF2_R2 = .Range(.Cells(1, 7), .Cells(.Cells(65536, 7).End(xlUp).Row, 7))
Set rngFTwo = Union(rngF2_R1, rngF2_R2)
Else
Set rngFTwo = rngF2_R1
End If
End With

lngCnt = 2
With wksFileOutput
With .Range("A1:AA1")
.Value = Array("F1(CIF)", "F1(CAT)", "F1(CD)", "F1(AMT)", "F1(RM)", , _
"F2(CIF)", "F2(CAT)", "F2(CD)", "F2(AMT)", "F2(RM)", , "NET", , "F1(CIF)", "F1(CAT)", "F1(CD)", "F1(AMT)", "F1(RM)", , _
"F2(CIF)", "F2(CAT)", "F2(CD)", "F2(AMT)", "F2(RM)", , "NET")
.Font.Bold = True
End With
With .Range("A:AA")
.HorizontalAlignment = xlCenter
End With
End With

For Each rOne In rngFOne

Set rTwo = rngFTwo.Find(What:=rOne.Value, LookIn:=xlValues, _
LookAt:=xlWhole, SearchOrder:=xlByRows)

If Not rTwo Is Nothing Then
lngCnt = lngCnt + 1

Select Case lngCnt
Case Is < 6503
With wksFileOutput
.Range(.Cells(lngCnt, 1), .Cells(lngCnt, 5)) = _
Array(rOne, rOne.Offset(, 1), rOne.Offset(, 2), _
rOne.Offset(, 3), rOne.Offset(, 4))

.Range(.Cells(lngCnt, 7), .Cells(lngCnt, 11)) = _
Array(rTwo, rTwo.Offset(, 1), rTwo.Offset(, 2), _
rTwo.Offset(, 3), rTwo.Offset(, 4))

rOne.Resize(1, 5).ClearContents
rTwo.Resize(1, 5).ClearContents

.Cells(lngCnt, 13).Value = .Cells(lngCnt, 4) - .Cells(lngCnt, 10)
End With

Case Else
With wksFileOutput
.Range(.Cells(lngCnt - 6500, 15), .Cells(lngCnt - 6500, 19)) = _
Array(rOne, rOne.Offset(, 1), rOne.Offset(, 2), _
rOne.Offset(, 3), rOne.Offset(, 4))

.Range(.Cells(lngCnt - 6500, 21), .Cells(lngCnt - 6500, 25)) = _
Array(rTwo, rTwo.Offset(, 1), rTwo.Offset(, 2), _
rTwo.Offset(, 3), rTwo.Offset(, 4))

rOne.Resize(1, 5).ClearContents
rTwo.Resize(1, 5).ClearContents

.Cells(lngCnt - 6500, 27).Value = _
.Cells(lngCnt - 6500, 18) - .Cells(lngCnt - 6500, 24)
End With

End Select
End If
Next
End Sub

Sub ImportFileData(FName As String, WKSName As String)
Dim oFSO As Object
Dim strFullName As String
Dim lngTextLineCount As Long
Dim strStringHolder As String
Dim wksFileContents As Worksheet

Set wksFileContents = Worksheets.Add(Before:=Worksheets(1), _
Count:=1, _
Type:=xlWorksheet)
wksFileContents.Name = WKSName

strFullName = ThisWorkbook.Path & "\" & FName

Set oFSO = CreateObject("Scripting.FileSystemObject")

Open strFullName For Input As #1

Do While Not EOF(1)
lngTextLineCount = lngTextLineCount + 1
Line Input #1, strStringHolder

Select Case lngTextLineCount
Case Is < 6501 '65001
wksFileContents.Cells(lngTextLineCount, 1).Value = strStringHolder
Case Else
wksFileContents.Cells(lngTextLineCount - 6500, 20).Value = strStringHolder
End Select

Loop

Close #1

Call ParseAndDumpFlack(wksFileContents, 1)
Call ParseAndDumpFlack(wksFileContents, 20)
wksFileContents.Columns("G:S").Delete Shift:=xlToLeft

End Sub

Function ParseAndDumpFlack(WSheet As Worksheet, ColNum As Integer)
WSheet.Columns(ColNum).TextToColumns _
Destination:=WSheet.Cells(1, ColNum), _
DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, _
ConsecutiveDelimiter:=False, _
Tab:=False, _
Semicolon:=False, _
Comma:=False, _
Space:=False, _
Other:=True, OtherChar:="|", _
FieldInfo:=Array(Array(1, 1), Array(2, 9), Array(3, 1), Array(4, 1), _
Array(5, 1), Array(6, 9), Array(7, 9), Array(8, 9), _
Array(9, 9), Array(10, 9), Array(11, 9), Array(12, 9), _
Array(13, 9), Array(14, 9), Array(15, 9), Array(16, 9), _
Array(17, 9), Array(18, 1), Array(19, 9))
End Function

Note1: as you had not supplied any code that you had started with, I would believe you might be starting out in VBA. If this is the case, I'll be happy to explain (comment) the code later. First I think it would be best to see if I'm in the neighborhood of what you are looking to do...

Note2: The above snippet is definately "rude & crude" and no where near complete. If you decide to run the test twice, you'll need to first delete the three sheets it creates...

Hope this helps,

Mark

Krishna Kumar
12-09-2008, 01:36 AM
It's Cross Post (http://www.mrexcel.com/forum/showthread.php?t=356962)

seema
12-09-2008, 02:57 AM
Thanks Mark and Kenneth

I got a debug at "parse and dump flack" . The debug was :"No Data was selected to Parse".

Regards
Seema

seema
12-09-2008, 03:05 AM
It's Cross Post (http://www.mrexcel.com/forum/showthread.php?t=356962)

You mean this one (http://www.mrexcel.com/forum/showthread.php?t=355904).

:yes

GTO
12-09-2008, 03:16 AM
Seema,

Please open the link supplied by Krishna at post #12.

I do see the substantially different membership dates, as well as the later thread initialization and a couple of other notable differences. That said, it is obviously the same project. Are you "Npsays" at mrexcel or what is the deal?

Mark

seema
12-09-2008, 03:18 AM
Now speaking only for myself, I have a hard time understanding what you mean by "... The entire output file doesn't need to be extracted in Excel..."

Admittedly only by my limited knowledge, but I do not see a reliable way of working with the values in column five of the input text files, other than to first import them (along with the entire line) into Excel first. Now maybe that statement will trigger another member's input for a better way, but in the meantime, take the below and paste it into a standard module in a new/blank workbook.

Mark

Dear Mark,
Since it is a huge file, the detailed working could remain in the text file. However, from that text file, I need the subtotals of difference of amount under resepective agents in the excel sheet .

I really appreciate your support in taking out time for helping me in resolving this.
Thanks Again.
Seema.

seema
12-11-2008, 02:09 AM
Seema,

Please open the link supplied by Krishna at post #12.

I do see the substantially different membership dates, as well as the later thread initialization and a couple of other notable differences. That said, it is obviously the same project. Are you "Npsays" at mrexcel or what is the deal?

Mark

Dear MArk
I am not npdays I have responded to krishna's link by providing my own link with which i have registered in that site. hope that clarifies .. i am looking forward to resolve this.
Thanks

GTO
12-11-2008, 03:31 AM
Thanks Mark and Kenneth

I got a debug at "parse and dump flack" . The debug was :"No Data was selected to Parse".

Regards
Seema

Hi Seema,
Hopefully you understand, as with the identical test data, it must be someone at the same company; which I assume you work for a larger type corp?

Anyways, it's not that cross-posting in and of itself, is some major sin - more at that a reasonable expectation of providing a link to the other site's thread is understood, as this can prevent any individual who is trying to assist from wasting their time and efforts in trying the same thing that another already has tried...

Well, hopefully that made sense?

Now let's get back to post #11 and your response. I ran the code I provided on both a decently fast machine w/Excel 2003 and on my 'slower than creeping death' laptop w/ Excel 2000... Nary an error to report. (I would fully admit that I could have both dressed-out a steer and cooked it by the time it completed...) So, please tell me at exactly which line in the ParseAndDumpFlack Function the error occurs at, as I have not been able to replicate. (Thanks)

Finally, reference post #16, sorry if I was not clear, but --- to the best of my knowledge, there's really no notably faster way (to the best of my knowledge). In short, let's think of he parameters. As the source files are well past Excel's row limit, we're not reading the text file in one swoop, no matter what. Given that, we're reading line-by-line, so I would contend that logic dicates that reading ea line into an Excel row cannot be slower than ... not only reading ea line individually, but stopping to parse out what is between two certain delimiters. Make sense?

Let me know as to the error, I'll be happy to provide what help my limited skills may provide.

Mark

Krishna Kumar
12-11-2008, 05:44 AM
Seema,

My apologies

seema
12-11-2008, 08:37 AM
Thanks ! - Mark, I completely appreciate and respect your point of view. It makes complete sense!

Krishna - Absolutely no Issues! similar files are circulated in team of people and is evident that someone is taking help from one of the experts there.

Dear Mark
On the Code:

It worked this time, I would have done some error executing it the last time !(spooky!). It was reasonably fast with 10k data in each file.
Now, I have an output sheet, which has differences of all matching entries in file one and file two. Transactions which are not matching , e.g. "present only in one report and not in other" are being written separately in respective sheets (file one and file two). This is GREAT !
We are going just fine till here.
Now: What would happen to data when it is more than 100,000 ? would it keep getting appended in adjecent rows in the out put sheet ?

Regarding the post #16 : I have to present only the subtotal data in the excel sheet, and not the whole file. I was thinking of storing the whole file in a text file as an audit trail, entailing how i arrived at the subtotal in the excel sheet. A text file would be smaller in size and will consume less space on the hard disk. ( we would have 100s of such files)

At the end of it, the records which I need in the excel sheet would be as below:-
Total Records in file 1 = 10
Total Records in file 2 = 10
Subtotals:
RM CAT Diff AMT
XXXX GEN 200
YYYY GEN 300
Total 500
Thanks All of you once again!
Regards
Seema

GTO
12-11-2008, 11:04 PM
Now, I have an output sheet, which has differences of all matching entries in file one and file two. Transactions which are not matching , e.g. "present only in one report and not in other" are being written separately in respective sheets (file one and file two). This is GREAT !

Okay - the code at post #11 was just to see if I was anywhere near in understanding the goals. We should then be able to get the remaining non-matching records, and insert these below the matching records...


Now: What would happen to data when it is more than 100,000 ? would it keep getting appended in adjecent rows in the out put sheet ?

Not exactly. Remember, the current example is "1/10 scale" so-to-speak. That is why it creates a secondary range after the first 6500 records. In the full-sized version, my thought at least was to have it create the secondary range after the first 65k records. Thus - you could have up to 130k records before an error.


Regarding the post #16 : I have to present only the subtotal data in the excel sheet, and not the whole file. I was thinking of storing the whole file in a text file as an audit trail, entailing how i arrived at the subtotal in the excel sheet. A text file would be smaller in size and will consume less space on the hard disk. ( we would have 100s of such files)

I think I was misunderstanding this part, as I took it that the sub-total reporting area would be somewhere below all the records in the destination/output text file.


At the end of it, the records which I need in the excel sheet would be as below:-
Total Records in file 1 = 10
Total Records in file 2 = 10
Subtotals:
RM CAT Diff AMT
XXXX GEN 200
YYYY GEN 300
Total 500

Okay - I think I got it now.

Mark

seema
12-13-2008, 07:50 AM
Hi
Please do let me know incase I need to clarify further..
Regards

Benzadeus
12-16-2008, 07:33 AM
Hi everybody, I'm new on Excel-VBA.

I tought about this problem and wrote a code that could help you. Check my attachment please and see if it fills your needs.

seema
12-17-2008, 12:29 PM
Benzadeus - thanks. but i am required to read more than 100k data from two files which after matching records in each, needs to provide me a subtotal.

Hi Mark - Looking forward for the 'final version' Thanks !: pray2:

Demosthine
12-17-2008, 10:35 PM
Good Evening.

I thoroughly applaud the work that has gone on so far. That has been some impressive coding.

Ken mentioned it briefly, but this entire task would be much easier (and quicker) using database technology. I've thrown together a sample based on the information I've seen so far of some of what can be done with ADO and Access. This is by no means and final solution, but it should provide a very solid basis for the project.

In the attached Archive, you will find four files: Schema.ini, Test1C.txt, Test2C.txt, and Composite.mdb. I will describe the first in last in greatest detail. TestnC.txt are just two large pipe (|) delimited values.


Schema.ini
The first key to this method is using a Schema.ini file. There is some great information on this on the MSDN Site (http://msdn.microsoft.com/en-us/library/ms709353.aspx). The file must be in the same directory as the delimited files or this method will not work.

When you first open the file, you will notice two distinct sections that are enclosed in brackets. These must the exact name of the file you are wishing to deal with. Since we have two delimited (source) files, there are two sections. Include the file extension as well.

The second line defines whether the source files have a Column Header Row defined. In this example, the first line in the source file is data, so we specify False in the Schema.

The third line is the character that will be used as the delimiting character. This must be enclosed in parenthesis. In this case, we are using a pipe (|) to separate values.


[Test1C.txt]
ColNameHeader=False
Format=Delimited(|)


The remainder of this section defines the Column Name for each of the separated values. If you open the source file, you will find there are 19 values, therefore we define 19 Column Names.

To define Column Names, you simply begin with enter 'Coln=' on each line. This is followed by your Column Name. If your Column Name will have a space in it, it must be enclosed in quotation marks, as seen in Col1. You will also have a secondary value on the line specifying what Data Type this Column will contain.


Composite.mdb
When you open the Database, you will notice there are no Tables defined. This is intentional. The project is managed through Queries. Select the Queries tab from the main window.

You must modify several of the queries before this will work. Because everyone's directory structures are different, you will need to define the directory of your source and destination files.

Begin by clicking on Test1C and then click Design. This should open the SQL View window. If it does not, select SQL View from the View Menu. You will find the following statement:

SELECT [F1 (CIF)], [F1 (CAT)], [F1 (CD)], [F1 (AMT)], [F1 (RM)]
FROM [Text;
FMT=Delimited;
HDR=NO;
CharacterSet=437;
DATABASE=C:\...\Desktop\ADO Delimited].Test1C.txt;

Change the Database Path to the appropriate location ofyour Test1C file. You do not have to enclose it in quotations or anything, even if there are spaces. Close the Query and save it.

Make the same changes to the Test2C Query as well.

Next, open the OutputJoin Query in Design View. Again, make sure you are using the SQL View window. The statement should look like:

SELECT [Test1C].[F1 (CIF)], [Test1C].[F1 (CAT)], [Test1C].[F1 (CD)], [Test1C].[F1 (AMT)], [Test1C].[F1 (RM)],
[Test2C].[F2 (CIF)], [Test2C].[F2 (CAT)], [Test2C].[F2 (CD)], [Test2C].[F2 (AMT)], [Test2C].[F2 (RM)],
([Test1C].[F1 (AMT)] - [Test2C].[F2 (AMT)]) AS [NET DIFFERENCE]
INTO [Text;
FMT=Delimited;
HDR=YES;
CharacterSet=437;
DATABASE=C:\...\Desktop\ADO Delimited].[Output].txt
FROM Test1C INNER JOIN Test2C ON [Test1C].[F1 (CIF)] = [Test2C].[F2 (CIF)]
WHERE ([Test1C].[F1 (AMT)] - [Test2C].[F2 (AMT)]) <> 0
ORDER BY [Test1C].[F1 (CIF)];

This time, change the Database Path to the directory that you want your final data written to. I used the same folder in the example. Close the Query and save it.


Executing the Queries.
In order to achieve the requested output of the two lists merged, with the difference calculated, click on the OutputJoin Query and then click Open. Access will prompt you with "You are about to run a make-table query that will modify data in your table." Click Yes. It will then ask you "You are about to paste XXXX row(s) into a new table." Click Yes again.

If you open the folder you told this Query to output to, you will find a new text file called Output.txt. This is a comma-delimited file containing only the information requested. This does not contain the Summary Information, though.

To get the summary information, you will open CountTest1C, CountTest2C, SubTotals, and Totals. The names should be descriptive enough to give you an idea of their functions. This will give you the appropriate information requested.


Finalizing the Project
Now that all of the information has been collected, you must decide how you want to save and display. It should be very easy (especially with some of the code already provided) to open the appropriate Queries through Excel VBA and write them to the Spreadsheets. Your SQL Statements would look similar to:

SELECT * FROM [QueryName];



Notes:
Because of the prelimary design of the project, all of the Queries open both source files each time they are executed. In a final version, this would be inefficient. You can modify the Queries similar to OutputJoin so that it saves to an actual table. This will ultimately speed processing times.


Further explanation is available if requested.
Good luck.
Scott

GTO
12-17-2008, 10:53 PM
Hi Seema,

I just finished what I believe to be a working example, but need to tidy up a bit (Dim's here and there, etc) and provide some commenting. Should be able to post tomorrow, barring an overly "active" day like today turned out to be.

In discussing this with Scott (Demosthine), as well as seeing some great info from XLD at another thread (sorry, no looking for right now), as you can see, you'll get two examples for the price of one:thumb

Mark

@Demosthine:
Thanks buddy:beerchug:

seema
12-18-2008, 11:00 PM
Thank you for your time Mark! eagrly looking forward for the code.

Hi Scott
I do not have MS Acess installed . This thought did occur to me, coz it would have been much easier and fast through access route.. appreciate your suggestion! but i would "have" to look at vba route.

Regards
Seema

Demosthine
12-19-2008, 03:16 PM
Good Afternoon Seema and All.

Strictly speaking, MsAccess is not necessarily required in order to implement an ActiveX Data Objects solution. Using the Queries I have defined and a bit of coding in Excel, it is possible to use ADO in the same methods I described. This is one of the awesome benefits to ADO and DAO. They are software independent.

If this interests you, would you do us a favor to help guide us better for your specific needs, would you check your References in Excel? Check to see if you can reference:

Microsoft ActiveX Data Objects x.x Library
Microsoft ActiveX Data Objects Recordset x.x Library
- or -
Microsoft DAO x.x Object Library

Take care.
Scott

Kenneth Hobs
12-19-2008, 03:30 PM
You can either install Acces or install ADO alone. It is worth it in the long run.
http://http://msdn.microsoft.com/en-us/data/aa937729.aspx (http://http//msdn.microsoft.com/en-us/data/aa937729.aspx)

Demosthine
12-19-2008, 07:37 PM
Good Evening All.

Attached, you will find an Excel Workbook that will access the appropriate files and complete the Query. It copies the Top 100 Records and places them into Sheet Master. The code is commented well, but download the file and see if it works on your system that does not have MsAccess.

Note: Download my previous ADO Delimited example. Copy this Excel Workbook and the Acess Database to the folder of your choice. Copy the Scheme and TestnC files to C:\Downloads\ADO Delimited\*. The code should work at that point.

Scott

GTO
12-20-2008, 05:34 AM
Greetings Seema,

Sorry this has taken longer than expected. Been a bit busy, and if there was such an instrument as a 'wacky-meter' for work and other concerns... well it would be past the redline.

Anyways, here is what I have come up with. In short, as a workaround to the text files' well exceeding what could be read and imported in one swoop, ea of the two text files are first read/imported into a new sheet (one new sheet per text file), using a secondary range for records above 65000. Thus, I believe it should work as long as neither of the two 'input' text files exceed 130,000 lines (or actually 130,536 I believe).

Extranious data is dumped while converting the text to columns, and we then (on a third created sheet) combine matching records, while adding in some calculation/performance monitoring for ea matched record. We then tidy up non-matching records, come up w/some totals for the reporting period, write all this to a new text file, and finally, destroy the temp sheets.

Now quite frankly, I'm sure that you will find Scott's suggestion quicker, and I hope you try it as well. That said, I think I've written enough to ensure your basic solution is within reach and you can tune-it a bit to suite. I am confident that there are gains to be made in speed and you may wish to experiment when running against the full-sized files, as at least according to my logic (questionable as that may be) I would think the runtime may not increase in a linear manner. I wonder as to the effect of running into non-matching records in either increased percentage amd/or earlier, and as to the effect for instance.

Anyways, hopefully this is helpful. Try it out, and if you run into problems 'tuning', post back.

Mark

seema
12-22-2008, 08:55 PM
Thanks Scott ! It took me some time to get back to you after trying the code. It is wonderful and I heartly applaud and appreciate the effort gone into it. Considering your claim of being a slow typist :rotlaugh: It is self explanatory and approprtately detailed for a novice like me. Thanks again!
On two points:
1. i need category wise subtotal for matched records against each agent. It this point I am getting only agentwise summary and not agentwise for each category.
2. It takes more than 3 hours to finish , which I guess is understandable and cannot be tweeked since the file size is huge.

I checked more than 100 records at random and found the data to be correct,so all is well there too.
Thanks once again , request you to refer point 1 & 2 and on possibility for a further tweak.
Regards


Hi Demosthine
Thanks a lot for your interest. I tried the files which gave me debuggers first at :


?.Procedures.Append Name, Query? already exists
cannot create file, path is not correct.


I moved the scroller further to surpass the bugger and encountered another at



rstQuery.Open "SELECT TOP 100 * FROM [CreatedJoin];", conDatabase, adOpenKeyset, adLockOptimistic ? is not a valid path?


Kindly let me know what I am missing, I would try it again.

Thanks
seema

GTO
12-22-2008, 09:56 PM
Hi seema,

I think you mean to refer to me (Mark / GTO) at the top of your messange (Demosthine is Scott), as I am the slow typist and Scott is quite the speedster on the keyboard.

Three hours? Yeeks! I think we would be best to wait a couple of days and see if the ADO route can work quicker; particularly with your requested addition of yet another sub-category level. Sorry I didn't understand that part before, but I think this would make my method even slower (actually I am quite sure it will). So I'm going to wait a few days and see what Scott or others may come up with, but don't worry, if no better solution comes forth, I am nothing if not obstinate and will figure 'er out yet!

Mark

seema
12-25-2008, 11:09 PM
I think you mean to refer to me (Mark / GTO) at the top of your messange (Demosthine is Scott), as I am the slow typist and Scott is quite the speedster on the keyboard.

Oops!!

I'll wait for the ADO route and see what it brings us. Wishing all a merry christmas and a prosperous new year ! :bow:

Demosthine
12-26-2008, 03:28 PM
Good Evening All.

Sorry it took so long to get back to this. It's been a busy week.

Seema:

In regards to the first error in the Procedures.Append method, the procedure was only set up to execute once. I've modified the code to delete the View if it already exists. Replace the procedure with the code below:


Public Function CreateProcedure(Name As String, Query As Object) _
As Boolean
On Error GoTo ErrHandler
' Define and instantiate an ADOX Catalog to create the
' Stored Procedure.
Dim catDatabase As Catalog
Set catDatabase = CreateObject("ADOX.Catalog")
' Append to the Command to the Stored-Procedures.
With catDatabase
Set .ActiveConnection = conDatabase
.Procedures.Append Name, Query
End With
Exit Function
ErrHandler:
With conDatabase
.Execute "DROP VIEW " & Name & ";"
End With
Resume 0
End Function



Now, in terms of the second issue, did you move all of the files to the correct locations that were mentioned in Post #30. I also had to modify Composite.mdb because the original did not reference the file directory I mentioned in the above post. Try to download the attached database and retest. Let me know if it doesn't work and what the errors are.

Scott

seema
12-27-2008, 10:20 AM
Hi Scott
On the following line, i am getting a complie error.


Dim catDatabase As Catalog

I am getting "user defined type not defined."

Regards
Seema

Kenneth Hobs
12-27-2008, 11:07 AM
You will need a reference for adox. See this link for details: http://www.functionx.com/vbaccess/Lesson04c.htm

Or, just select it from Tools > References > Microsoft ADO Ext. 2.7 for DDL and Security.

You can also change it to early binding by:
Dim catDatabase As ADOX.Catalog
'Set catDatabase = CreateObject("ADOX.Catalog")
Set catDatabase = New ADOX.Catalog


You will get an error then for conDatabase if you use Option Explicit. Maybe Demonsthine can explain how to set it.

Demosthine
12-27-2008, 03:21 PM
Afternoon Everyone.

Seema, I'm sincerely sorry about not changing that back to late-binding. In the new code from Post #35, change the line you are getting the error on to read:


Dim catDatabase As Object

Set catDatabase = CreateObject("ADOX.Catalog")


That should fix your error.

Kenneth:
He shouldn't get an error for conDatabase. It is defined as a module-level variable and is opened before execution of this procedure. It is closed much later in the process. Thanks for the prompt response to that error, though.

I'm hope this means I'm done submitting errors. :banghead:
Scott

seema
12-28-2008, 06:18 AM
Hi Scott,

It works perfectly !! At this point in time it is taking out data for only 100 records , lets test it for all records!!:beerchug:


Thanks
Seema



Kenneth:
He shouldn't get an error for conDatabase.

Needs to be changed to "She shouldn't get an error for conDatabase." :blush

GTO
12-28-2008, 06:40 AM
@Demostine:

Hey Brother,

I like the subtitle (Access or ADO or similar) at post #25.:thumb I'll do similar as in "Excel Only"


@Seema:

Hey Seema,

I've been thinking about the 3+ hours you state at post #32. I have a question:

When you ran the code against the full-sized text files, are the text files, and/or the workbook located on (saved to) a network drive?

Just FYI: I have been looking at the additional sub-total of 'agent' vs. ea category, but want to see what the deal is speed-wise first. While I expected an exponential of some sort when running against larger text files, in my limited testing, 3 hrs seems excessive...

I haven't looked at Scott's (Demosthine) latest yet, but expect that he stayed with keeping stuff on the local drice (C:\). When testing mine, if either the text files, or the workbook, or both, are located on a sharedrive, please re-test w/all saved to the hard drive and let me know the outcome. (an apples-to-apples testing if you will)

Thank you,

Mark

Demosthine
12-28-2008, 12:25 PM
Morning Seema.

I apologize for the "he" comment. It's hard when the Forum is so face-less.

Converting the code from selecting only the "Top 100" records versus all of them is extremely easy. But note that, as currently written, will error if the total number of records exceeds the number of rows in a worksheet.

That being said, in Module1 you need to remove the two words "TOP 100" from Line 43. The Query Statement says:


rstQuery.Open "SELECT TOP 100 * FROM [CreatedJoin];", conDatabase, adOpenKeyset, adLockOptimistic


It should now read:


rstQuery.Open "SELECT * FROM [CreatedJoin];", conDatabase, adOpenKeyset, adLockOptimistic



You can also just skip the RecordSet Object and execute an Action Query that exports directly from the Database to your Output.txt File.

Immediately after CreateProcedure "CreatedJoin", cmdCreateProcedure, execute the following:

conDatabase.Execute "SELECT * INTO " & _
" [Text;" & _
" FMT=Delimited; " & _
" HDR=NO; " & _
" CharacterSet=437; " & _
" DATABASE=C:\Downloads\ADO Delimited].[Output.txt] " & _
"FROM [CreatedJoin];"



Benchmark System:
HP Pavilian dv9000
AMD Turion 64 x2 1.90GHz
2GB RAM

Doing a benchmark test, using this method, the process took 11 seconds with two 10,000 line files. When I duplicated the 10,000 lines into two 100,000 line files, it took 48 seconds to complete. This does not include your summary lines, though.

Enjoy.
Scott

P.S. Anyone know of a free (or really inexpensive) ADO/ADOX Manager that allows you to create or manipulate databases through Jet? Might help Seema to see the actual database and pre-defined Queries.

seema
01-02-2009, 01:13 AM
:beerchug: :beerchug: :beerchug:

Here's wishing every one a very happy and prosperous new year 2009!!: pray2:

The code works like charm ! It is an incredible effort and I can't thank enough to all of you for making it work for me. I am extremely grateful to all of you for sharing the knowledge..it is a learning for me !

So once again Thanks Scott ! Thanks Mark! and Thanks Kenneth !

Wishing you and the fourm all the very best !

kind regards
Seema

GTO
01-02-2009, 01:25 AM
Greetings Seema,

I hope your Christmas was also nice :-)

Please double-check, as the last time Scott and I discussed it, I do not think his version was returning the non-matching records. I believe he stated that this was fixable though, thru modifying and/or adding a couple of queries.

Also - could you answer my questions from post #40 if you don't mind.

Thank you so much,

Mark

umeshgupta
02-05-2009, 02:05 AM
hi seema