Consulting

Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 44

Thread: extract and compare data from text files through excel vba

  1. #1
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location

    extract and compare data from text files through excel vba

    Hi All,
    I have just joined, and am struggling with the following:
    Need to extract and compare data from two text files. Data in these files is separated with pipes. I only need to extract relevant data and while comparing; I need to take out the difference between two data points. The problem that I am facing is that the pipes are not separated uniformly. And I am finding it difficult to extract the relevant data.

    I need to do the following steps:

    1. Extract relevant data from both text files.
    2. Transfer the data to another output file in text format (since the record exceed 100K).
    3. Take out difference of values between two data points
    4. Prepare a summary.

    Data in File 1.

    14|FN|AAA|4|3.14|2||1||+1|1800|S|A|1-1974||NA|NA|BSSM|MANAGER

    10|VIN|PPP|4|3.0|46||||162|22|AD|D|1-1970||NA|NA|AEBS|BRAN



    Need ?14? ,?AAA? ,?4?,?3.14?, ?BSSM? data points from line 1 in file 1.

    Need ?10?,?PPP?, ?4?, ?3.0? , ?AEBS? data points from line 2 in file 1.

    Data in File 2
    Would have similar records as file 1.

    Extract : BSSM and AEBS data points are employee names , 3.14 and 3.0 are amounts. 14 and 10 are unique identifiers.

    BSSM and AEBS is hypothec data and can vary, but the character length is fixed.
    Unique identifier is hypothec data and can vary, by the length is fixed.

    I need the difference of amounts in file 1 and 2 for respective employees against their unique identifiers in another text file.

    This I want to do a subtotal in excel. The text files have more that 100,000 records in each.

    Pls assist.

    Thanks

  2. #2
    Moderator VBAX Wizard lucas's Avatar
    Joined
    Jun 2004
    Location
    Tulsa, Oklahoma
    Posts
    7,323
    Location
    Moved to appropriate forum. This will increase your chance of getting help with your problem.
    Steve
    "Nearly all men can stand adversity, but if you want to test a man's character, give him power."
    -Abraham Lincoln

  3. #3
    Knowledge Base Approver VBAX Guru GTO's Avatar
    Joined
    Sep 2008
    Posts
    3,368
    Location
    Quote Originally Posted by seema
    ...Data in these files is separated with pipes. I only need to extract relevant data and while comparing...

    ...The problem that I am facing is that the pipes are not separated uniformly....
    Greetings seema,

    Hey, I see that you just just joined, so let me say "Howdy" and welcome. You will meet some very nice folks here, who go out of their way to be helpful.

    As to your question, maybe I could help a tiny bit by getting some clarity. What does "the pipes are not seperated uniformly" mean? In the examples you gave, it appears that the pipes could indeed be used to delimit to columns, and leastwise to my fading brain, this would seem to make data extraction reliable. By chance, are you just referring to how many characters/spaces may exist between given pipe sets?

    Mark

  4. #4
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    By chance, are you just referring to how many characters/spaces may exist between given pipe sets?
    Hi Mark!
    Absolutely ! the data is separated with pipes and character lenght vary for each.

    Thanks for looking into !

    Regards
    Seema

  5. #5
    Knowledge Base Approver VBAX Guru GTO's Avatar
    Joined
    Sep 2008
    Posts
    3,368
    Location
    Hi Seema,

    Are you still looking for a solution to this?

    Mark

  6. #6
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Quote Originally Posted by GTO
    Hi Seema,

    Are you still looking for a solution to this?

    Mark
    Frantically Mark ... Frantically ! Pls help out!
    Thanks

  7. #7
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    The task would have been easier had the files been saved in database format or csv. As it is, we could do it using Split() to get the right records. However, it is unclear what the difference deal is about.

    If you can make and attach two sample txt files with say 10 rows of data, and then manually insert the parts to make the third file that you want to create, it would be easier to help.

  8. #8
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Hi Kenneth

    I have attached three files.

    File 1 & File 2 are original files from where I need to extract relavant data.
    File 3 is the output which it needs to provide after comparing the above files.

    Please note, there could be a difference in total number of records in both files hence while matching the data in file 1 & 2, the code would have to consider all lines.

    Thanks Seema

  9. #9
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    Is there a one to one relationship for the rows in the 2 input files?

    I guess you understand that the output file size will be a bit larger than the file size of the sum of the input file sizes since you are repeating the information in each file.

  10. #10
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Quote Originally Posted by Kenneth Hobs
    Is there a one to one relationship for the rows in the 2 input files?
    In the output file:-

    F1 (CIF) – This has been extracted from first file
    FI (RM) – This has been extracted from the first file
    F2 (CIF) – This has been extracted from the second file
    F2(RM) –This has been extracted from the second file

    The above should tally, only then, the difference between F1 (AMT) and F2 (AMT) should be taken out. The F1 and F2 follow chronology but there could be a situation where new records are available in F2 which may not be there in the F1, such records should not give me a difference value but could give “NA”.

    Quote Originally Posted by Kenneth Hobs
    I guess you understand that the output file size will be a bit larger than the file size of the sum of the input file sizes since you are repeating the information in each file.
    The total columns in the output file would be 11 against 18 in file 1 and 18 in file2. Guess the file size should not be double. Also, I need to import only the subtotals from the output file in the excel file. The entire output file doesn’t need to be extracted in excel. The remaining record wise data would be retained as an audit trail for the subtotal in the text format only.

  11. #11
    Knowledge Base Approver VBAX Guru GTO's Avatar
    Joined
    Sep 2008
    Posts
    3,368
    Location
    Quote Originally Posted by seema
    The total columns in the output file would be 11 against 18 in file 1 and 18 in file2. Guess the file size should not be double. Also, I need to import only the subtotals from the output file in the excel file. The entire output file doesn?t need to be extracted in excel. (underlining added for clarity) The remaining record wise data would be retained as an audit trail for the subtotal in the text format only.
    Greetings Seema,

    As both Kenneth and I are having a bit of a time picturing exactly how you want this to work, I thought that the best thing might be to supply a snippet, and see if it is "headed in the right direction".

    Now speaking only for myself, I have a hard time understanding what you mean by "... The entire output file doesn't need to be extracted in Excel..."

    Admittedly only by my limited knowledge, but I do not see a reliable way of working with the values in column five of the input text files, other than to first import them (along with the entire line) into Excel first. Now maybe that statement will trigger another member's input for a better way, but in the meantime, take the below and paste it into a standard module in a new/blank workbook.

    Save the test workbook in a folder of your choice (I'd just create one for the test).

    Create two test text files, ea having 10,000 records, name them: "Test1.txt" and "Test2.txt" - and save these to the same folder your test workbook is in.

    Run the sub Main().

    Now, as you'll note, I have this currently writing out the first 6500 records in the first range and the remainder in a range off to the right. This is of course to basically demonstrate the basic "getting the info" in 1/10th scale.

    Now - if this gets you the basic info you're looking for, then we can see if you still want to move all the info to another text file...

    [vba]Option Explicit
    Sub Main()

    With Application
    .Calculation = xlCalculationManual
    .ScreenUpdating = False

    Call ImportFileData("Test1.txt", "File One")
    Call ImportFileData("Test2.txt", "File Two")

    Call FindAndCombine

    .Calculation = xlCalculationAutomatic
    .ScreenUpdating = True
    End With
    End Sub

    Private Sub FindAndCombine()
    Dim wksFOne As Worksheet
    Dim wksFTwo As Worksheet
    Dim rngF1_R1 As Range
    Dim rngF1_R2 As Range
    Dim rngF2_R1 As Range
    Dim rngF2_R2 As Range
    Dim rngFOne As Range
    Dim rngFTwo As Range
    Dim rOne As Range
    Dim rTwo As Range
    Dim wksFileOutput As Worksheet
    Dim sngOutputVal As Single
    Dim lngCnt As Long
    Set wksFileOutput = Worksheets.Add(Before:=Worksheets(1), Count:=1, Type:=xlWorksheet)
    wksFileOutput.Name = "Output"
    Set wksFOne = Worksheets("File One")
    Set wksFTwo = Worksheets("File Two")

    With wksFOne
    Set rngF1_R1 = .Range(.Cells(1, 1), .Cells(.Cells(65536, 1).End(xlUp).Row, 1))
    If Not .Cells(1, 7) = Empty Then
    Set rngF1_R2 = .Range(.Cells(1, 7), .Cells(.Cells(65536, 7).End(xlUp).Row, 7))
    Set rngFOne = Union(rngF1_R1, rngF1_R2)
    Else
    Set rngFOne = rngF1_R1
    End If
    End With

    With wksFTwo
    Set rngF2_R1 = .Range(.Cells(1, 1), .Cells(.Cells(65536, 1).End(xlUp).Row, 1))
    If Not .Cells(1, 7) = Empty Then
    Set rngF2_R2 = .Range(.Cells(1, 7), .Cells(.Cells(65536, 7).End(xlUp).Row, 7))
    Set rngFTwo = Union(rngF2_R1, rngF2_R2)
    Else
    Set rngFTwo = rngF2_R1
    End If
    End With

    lngCnt = 2
    With wksFileOutput
    With .Range("A1:AA1")
    .Value = Array("F1(CIF)", "F1(CAT)", "F1(CD)", "F1(AMT)", "F1(RM)", , _
    "F2(CIF)", "F2(CAT)", "F2(CD)", "F2(AMT)", "F2(RM)", , "NET", , "F1(CIF)", "F1(CAT)", "F1(CD)", "F1(AMT)", "F1(RM)", , _
    "F2(CIF)", "F2(CAT)", "F2(CD)", "F2(AMT)", "F2(RM)", , "NET")
    .Font.Bold = True
    End With
    With .Range("A:AA")
    .HorizontalAlignment = xlCenter
    End With
    End With

    For Each rOne In rngFOne

    Set rTwo = rngFTwo.Find(What:=rOne.Value, LookIn:=xlValues, _
    LookAt:=xlWhole, SearchOrder:=xlByRows)

    If Not rTwo Is Nothing Then
    lngCnt = lngCnt + 1

    Select Case lngCnt
    Case Is < 6503
    With wksFileOutput
    .Range(.Cells(lngCnt, 1), .Cells(lngCnt, 5)) = _
    Array(rOne, rOne.Offset(, 1), rOne.Offset(, 2), _
    rOne.Offset(, 3), rOne.Offset(, 4))

    .Range(.Cells(lngCnt, 7), .Cells(lngCnt, 11)) = _
    Array(rTwo, rTwo.Offset(, 1), rTwo.Offset(, 2), _
    rTwo.Offset(, 3), rTwo.Offset(, 4))

    rOne.Resize(1, 5).ClearContents
    rTwo.Resize(1, 5).ClearContents

    .Cells(lngCnt, 13).Value = .Cells(lngCnt, 4) - .Cells(lngCnt, 10)
    End With

    Case Else
    With wksFileOutput
    .Range(.Cells(lngCnt - 6500, 15), .Cells(lngCnt - 6500, 19)) = _
    Array(rOne, rOne.Offset(, 1), rOne.Offset(, 2), _
    rOne.Offset(, 3), rOne.Offset(, 4))

    .Range(.Cells(lngCnt - 6500, 21), .Cells(lngCnt - 6500, 25)) = _
    Array(rTwo, rTwo.Offset(, 1), rTwo.Offset(, 2), _
    rTwo.Offset(, 3), rTwo.Offset(, 4))

    rOne.Resize(1, 5).ClearContents
    rTwo.Resize(1, 5).ClearContents

    .Cells(lngCnt - 6500, 27).Value = _
    .Cells(lngCnt - 6500, 18) - .Cells(lngCnt - 6500, 24)
    End With

    End Select
    End If
    Next
    End Sub

    Sub ImportFileData(FName As String, WKSName As String)
    Dim oFSO As Object
    Dim strFullName As String
    Dim lngTextLineCount As Long
    Dim strStringHolder As String
    Dim wksFileContents As Worksheet

    Set wksFileContents = Worksheets.Add(Before:=Worksheets(1), _
    Count:=1, _
    Type:=xlWorksheet)
    wksFileContents.Name = WKSName

    strFullName = ThisWorkbook.Path & "\" & FName

    Set oFSO = CreateObject("Scripting.FileSystemObject")

    Open strFullName For Input As #1

    Do While Not EOF(1)
    lngTextLineCount = lngTextLineCount + 1
    Line Input #1, strStringHolder

    Select Case lngTextLineCount
    Case Is < 6501 '65001
    wksFileContents.Cells(lngTextLineCount, 1).Value = strStringHolder
    Case Else
    wksFileContents.Cells(lngTextLineCount - 6500, 20).Value = strStringHolder
    End Select

    Loop

    Close #1

    Call ParseAndDumpFlack(wksFileContents, 1)
    Call ParseAndDumpFlack(wksFileContents, 20)
    wksFileContents.Columns("G:S").Delete Shift:=xlToLeft

    End Sub

    Function ParseAndDumpFlack(WSheet As Worksheet, ColNum As Integer)
    WSheet.Columns(ColNum).TextToColumns _
    Destination:=WSheet.Cells(1, ColNum), _
    DataType:=xlDelimited, _
    TextQualifier:=xlDoubleQuote, _
    ConsecutiveDelimiter:=False, _
    Tab:=False, _
    Semicolon:=False, _
    Comma:=False, _
    Space:=False, _
    Other:=True, OtherChar:="|", _
    FieldInfo:=Array(Array(1, 1), Array(2, 9), Array(3, 1), Array(4, 1), _
    Array(5, 1), Array(6, 9), Array(7, 9), Array(8, 9), _
    Array(9, 9), Array(10, 9), Array(11, 9), Array(12, 9), _
    Array(13, 9), Array(14, 9), Array(15, 9), Array(16, 9), _
    Array(17, 9), Array(18, 1), Array(19, 9))
    End Function[/vba]

    Note1: as you had not supplied any code that you had started with, I would believe you might be starting out in VBA. If this is the case, I'll be happy to explain (comment) the code later. First I think it would be best to see if I'm in the neighborhood of what you are looking to do...

    Note2: The above snippet is definately "rude & crude" and no where near complete. If you decide to run the test twice, you'll need to first delete the three sheets it creates...

    Hope this helps,

    Mark

  12. #12
    VBAX Contributor
    Joined
    Jul 2004
    Location
    Gurgaon, India
    Posts
    148
    Location
    It's Cross Post

  13. #13
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Thanks Mark and Kenneth

    I got a debug at "parse and dump flack" . The debug was :"No Data was selected to Parse".

    Regards
    Seema

  14. #14
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Quote Originally Posted by Krishna Kumar
    You mean this one .


  15. #15
    Knowledge Base Approver VBAX Guru GTO's Avatar
    Joined
    Sep 2008
    Posts
    3,368
    Location
    Seema,

    Please open the link supplied by Krishna at post #12.

    I do see the substantially different membership dates, as well as the later thread initialization and a couple of other notable differences. That said, it is obviously the same project. Are you "Npsays" at mrexcel or what is the deal?

    Mark

  16. #16
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Quote Originally Posted by GTO
    Now speaking only for myself, I have a hard time understanding what you mean by "... The entire output file doesn't need to be extracted in Excel..."

    Admittedly only by my limited knowledge, but I do not see a reliable way of working with the values in column five of the input text files, other than to first import them (along with the entire line) into Excel first. Now maybe that statement will trigger another member's input for a better way, but in the meantime, take the below and paste it into a standard module in a new/blank workbook.

    Mark
    Dear Mark,
    Since it is a huge file, the detailed working could remain in the text file. However, from that text file, I need the subtotals of difference of amount under resepective agents in the excel sheet .

    I really appreciate your support in taking out time for helping me in resolving this.
    Thanks Again.
    Seema.

  17. #17
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Quote Originally Posted by GTO
    Seema,

    Please open the link supplied by Krishna at post #12.

    I do see the substantially different membership dates, as well as the later thread initialization and a couple of other notable differences. That said, it is obviously the same project. Are you "Npsays" at mrexcel or what is the deal?

    Mark
    Dear MArk
    I am not npdays I have responded to krishna's link by providing my own link with which i have registered in that site. hope that clarifies .. i am looking forward to resolve this.
    Thanks

  18. #18
    Knowledge Base Approver VBAX Guru GTO's Avatar
    Joined
    Sep 2008
    Posts
    3,368
    Location
    Quote Originally Posted by seema
    Thanks Mark and Kenneth

    I got a debug at "parse and dump flack" . The debug was :"No Data was selected to Parse".

    Regards
    Seema
    Hi Seema,
    Hopefully you understand, as with the identical test data, it must be someone at the same company; which I assume you work for a larger type corp?

    Anyways, it's not that cross-posting in and of itself, is some major sin - more at that a reasonable expectation of providing a link to the other site's thread is understood, as this can prevent any individual who is trying to assist from wasting their time and efforts in trying the same thing that another already has tried...

    Well, hopefully that made sense?

    Now let's get back to post #11 and your response. I ran the code I provided on both a decently fast machine w/Excel 2003 and on my 'slower than creeping death' laptop w/ Excel 2000... Nary an error to report. (I would fully admit that I could have both dressed-out a steer and cooked it by the time it completed...) So, please tell me at exactly which line in the ParseAndDumpFlack Function the error occurs at, as I have not been able to replicate. (Thanks)

    Finally, reference post #16, sorry if I was not clear, but --- to the best of my knowledge, there's really no notably faster way (to the best of my knowledge). In short, let's think of he parameters. As the source files are well past Excel's row limit, we're not reading the text file in one swoop, no matter what. Given that, we're reading line-by-line, so I would contend that logic dicates that reading ea line into an Excel row cannot be slower than ... not only reading ea line individually, but stopping to parse out what is between two certain delimiters. Make sense?

    Let me know as to the error, I'll be happy to provide what help my limited skills may provide.

    Mark

  19. #19
    VBAX Contributor
    Joined
    Jul 2004
    Location
    Gurgaon, India
    Posts
    148
    Location
    Seema,

    My apologies

  20. #20
    VBAX Regular
    Joined
    Nov 2008
    Posts
    18
    Location
    Thanks ! - Mark, I completely appreciate and respect your point of view. It makes complete sense!

    Krishna - Absolutely no Issues! similar files are circulated in team of people and is evident that someone is taking help from one of the experts there.

    Dear Mark
    On the Code:

    It worked this time, I would have done some error executing it the last time !(spooky!). It was reasonably fast with 10k data in each file.
    Now, I have an output sheet, which has differences of all matching entries in file one and file two. Transactions which are not matching , e.g. "present only in one report and not in other" are being written separately in respective sheets (file one and file two). This is GREAT !
    We are going just fine till here.
    Now: What would happen to data when it is more than 100,000 ? would it keep getting appended in adjecent rows in the out put sheet ?

    Regarding the post #16 : I have to present only the subtotal data in the excel sheet, and not the whole file. I was thinking of storing the whole file in a text file as an audit trail, entailing how i arrived at the subtotal in the excel sheet. A text file would be smaller in size and will consume less space on the hard disk. ( we would have 100s of such files)

    At the end of it, the records which I need in the excel sheet would be as below:-
    Total Records in file 1 = 10
    Total Records in file 2 = 10
    Subtotals:
    RM CAT Diff AMT
    XXXX GEN 200
    YYYY GEN 300
    Total 500
    Thanks All of you once again!
    Regards
    Seema

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •