PDA

View Full Version : [SOLVED:] Loop in .txt file to extract recurring strings



BrI
07-11-2017, 01:20 PM
I'm trying to extract two text strings that will have varying numbers of occurrences in a text file (uploaded a sample with two occurrences - sorry, could not upload a .txt file so put in excel text box).

The strings I want will always follow certain key words so thinking I will use variations of the code below to select the line contents after the following key words:
1) "Interest Holder Name:" - I want the bank name that follows this
2) "Document Reference:" - I want the date that follows this


strIntHlder = Trim(Split(Split(newFileContent, "Interest Holder Name:")(1), " " & vbCrLf)(0))

strDocRef = Trim(Split(Split(newFileContent, "Document Reference:")(1), " " & vbCrLf)(0))


A couple of issues:

1) I want to store all occurrences of the text strings (there can be several), can I use a loop to do this in a text file?

2) The key words will also occur in other areas in the document that I do not want to reference (not shown in the attachment). So, can I isolate a loop range - maybe to a range defined by the first and last text lines shown in the attachment?

Sorry, could not upload a .txt file so put in excel text box.

Thanks for any assistance

mdmackillop
07-11-2017, 03:42 PM
FYI, You can attach a zip file if you can't post a specific file type.

Give this a try.

Sub Test()
Dim objFSO As Object
Dim objTF As Object
Dim strIn 'As String
Dim txt
Dim Pth

Pth = "C:\VBAX\Bank.txt" 'Change to suit

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTF = objFSO.OpenTextFile(Pth, 1)
strIn = objTF.readall


strt = InStr(1, strIn, "RECORDED INTERESTS AND INSTRUMENTS")
endd = InStr(1, strIn, "NON-ENABLING INSTRUMENTS")
txt = Mid(strIn, strt, endd - strt)

strIntHlder = Split(txt, "Interest Holder Name:")
For i = LBound(strIntHlder) To UBound(strIntHlder) - 1
Cells(i + 1, 1) = Trim(Split(Split(txt, "Interest Holder Name:")(i + 1), " " & vbCrLf)(0))
Next i

strDocRef = Split(txt, "Document Reference:")
For i = LBound(strDocRef) To UBound(strDocRef) - 1
Cells(i + 1, 2) = Split(Trim(Split(txt, "Document Reference:")(i + 1)), " ")(0)
Next i

Cells.Replace What:="Qualifier:", Replacement:="", LookAt:=xlPart

End Sub

BrI
07-12-2017, 07:52 AM
Excellent! Tested on several docs and working perfectly - will also be very helpful in other situations. Really appreciate your help.

One question, is there a reasonably easy way to extract the date from the "Document Reference:" line rather than the document number?

For example, your code currently extracts "75680398987" (note, number of digits will vary in this number) from the line below - could it be modified to extract "2004-06-15" instead?

Document Reference: 75680398987 2004-06-15 12:27:39

mdmackillop
07-12-2017, 09:22 AM
Sub Test()
Dim objFSO As Object
Dim objTF As Object
Dim strIn 'As String
Dim txt


Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTF = objFSO.OpenTextFile("C:\VBAX\Bank.txt", 1)
strIn = objTF.readall


strt = InStr(1, strIn, "RECORDED INTERESTS AND INSTRUMENTS")
endd = InStr(1, strIn, "NON-ENABLING INSTRUMENTS")


txt = Mid(strIn, strt, endd - strt)


Do Until InStr(txt, " ") = 0 ' Loop until there are no more double spaces
txt = Replace(txt, " ", " ") ' Replace 2 spaces with 1 space
Loop


strIntHlder = Split(txt, "Interest Holder Name:")
For i = LBound(strIntHlder) To UBound(strIntHlder) - 1
Cells(i + 1, 1) = Trim(Split(Split(txt, "Interest Holder Name:")(i + 1), " " & vbCrLf)(0))
Next i


strDocRef = Split(txt, "Document Reference:")
For i = LBound(strDocRef) To UBound(strDocRef) - 1
Cells(i + 1, 2) = Split(Trim(Split(txt, "Document Reference:")(i + 1)), " ")(0)
Cells(i + 1, 3) = Split(Trim(Split(Trim(Split(txt, "Document Reference:")(i + 1)), vbCrLf)(0)), " ")(1)


Next i


Cells.Replace What:="Qualifier:", Replacement:="", LookAt:=xlPart


End Sub

BrI
07-12-2017, 10:30 AM
Yes! Working on all my tests. Can't thank you enough for applying your expertise to this, a great help. Really outstanding!!!