PDA

View Full Version : [SOLVED:] VBA - Extracting 3 Text Lines From Text files



dj44
01-14-2017, 06:36 AM
Hi folks,

I have encountered another problem.

I am trying to modify the vba code to import 3 pieces of information from some text files.

Extract the Lines that contain or begin with

##100
XX
%%

I was so happy when i found this and thought i could change a few lines using the instring function - but it has deadlocked me again.



Sub ImportFromTextfiles()
http://stackoverflow.com/questions/35826629/how-to-import-specific-text-from-files-in-to-excel

Dim fso As FileSystemObject
Dim folder As folder, file As file, FileText As TextStream
Dim TextLine As String
Dim cl As Range

Dim num As Long ' numerical part of key, as in "Ann:"
Dim col As Long ' target column in Excel sheet
Dim key As String ' Part before ":"
Dim value As String ' Part after ":"

' Get a FileSystem object
Set fso = New FileSystemObject


Set folder = fso.GetFolder("C:\Users\DJ\Desktop\Import\")

' Set the starting point to write the data to


'-Start Data at Row & Column
Set cl = ActiveSheet.Cells(2, 3)


For Each file In folder.Files
Set FileText = file.OpenAsTextStream(ForReading) ' Open the file


Do While Not FileText.AtEndOfStream ' Read the file one line at a time
TextLine = FileText.ReadLine 'read line


'---- Problem

' key = Split(TextLine & ":", ":")(0)
value = Trim(Mid(TextLine, Len(key) + 2))
' num = Val(Mid(key, 2))
If num Then key = Replace(key, num, "") ' Remove number from key
col = 0


'--- Data Lines containing placeholders to Extract
If key = "##100" Then col = 1
If key = "XX" Then col = 2
If key = "VV" Then col = 2 + num
If col Then
cl.Offset(, col - 1).value = value ' Fill cell
End If
Loop


FileText.Close ' Clean up

Set cl = cl.Offset(1) ' Next row
Next file


End Sub




I have spent a lot of time trying different ways to extract the lines but nothing works :(

I tried using if instring function as well

How can I just get the lines that I need in the columns

Thank you for your time - I really appreciate it

Leith Ross
01-14-2017, 02:05 PM
Hello dj44,

I would be a great help if you attach a copy of the text file. It makes for a faster answer to your question.

dj44
01-14-2017, 02:31 PM
Hello Leith,

it's just a normal text files laid that has some paragraphs in it. But i was intrested in specific lines.




##100 Hello From Me File A

Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add.

For example, you can add a

XX THIS LINE i would like to have - just a line

When you apply styles, your headings change to match the new theme. Save time in Word with new buttons that show up where you need them.
some where later

%% Hello I would like to import this line too



I have been working on it still - but oh dear its not coming to much fruition so far.

Leith Ross
01-14-2017, 06:35 PM
Hello dj44,

I found a few inconsistencies in your example. You don't have "VV" which appears in your keys. "VV" is the only example you have with a number following it that determines the column it will go to on the worksheet. Makes wonder if there are others. You also fail to mention or provide an example of the final data layout on the worksheet.

It is the little things that make or break coding a solution that will work. This is why I asked for the text file. It is faster for me to scan a file and create a list of questions I can ask you all at once.

If you can not upload a file to this site then use a file sharing service and post a link to the file. If you don't feel comfortable doing that, I can provide you my email address.

dj44
01-14-2017, 07:31 PM
Hello Leith,

Thank you for taking the time to investigate this,

My bad, Im really clumsy at explaining things and make a dogs dinner of code as ive been told off here before many a time:)

I have no clue about keys - that’s the first time I saw it.

The text files are not very structured - I could not upload it but between the 2 rulers

============================================
## General ventilation in the workplace:

Supervision

Further information

General ventilation in the workplace: Guidance for employers
HSG202 HSE Books 2000
ISBN 0 7176 1793 9

XX A new practical guide to complying with COSHH in the rubber industry
Guidance HSE Books 1997
ISBN 0 7176 1372 0

The selection, use and maintenance of respiratory protective equipment: A
practical guide
HSG53 (Second edition) HSE Books 1998
ISBN 0 7176 1537 5

Safe to breathe: Dust and fume control in the rubber industry
Leaflet IACL95
HSE Books 1995 (single copy free or priced packs of 10
ISBN 0 7176 0979 0

Controlling airborne contaminants in the workplace

Technical Guide TG7 British
Occupational Hygiene Society 1987
%% ISBN 0 9059 2742
================================================






I have a very baby script that can extract 1 line from a text file, but on this occasion there are 3 lines, and some one put in these placeholders, and I hate to have to copy and paste and go through the mess.

18036

Yes those placeholders are not the best I must say so.

I could do a copy and replace to make them into a word if that’s any better, so may be the instring function can be used.

thank you for looking at this

Leith Ross
01-14-2017, 08:02 PM
Hello dj,

That does help more, thanks.



The only values of interest are "##","XX", and "%%"?
When copied to the worksheet, do you want the prefix removed, i.e. "##", "XX", %%"?
Will the columns being used be "C", "D", and "E"?
Which row will be the starting row?
Are all the files in the folder text files (.txt or .csv)?

dj44
01-14-2017, 08:23 PM
Hello Leith,


Hello dj,

That does help more, thanks.



The only values of interest are "##","XX", and "%%"? Yes
When copied to the worksheet, do you want the prefix removed, i.e. "##", "XX", %%"? Yes please
Will the columns being used be "C", "D", and "E"? Yes - I am very very bad with resizing cells, I never can understand - how it works
Which row will be the starting row? Row 10 please
Are all the files in the folder text files (.txt or .csv)? Text Files



I'm not very good with computers any way and sitting in my worn out leather chair to copy and paste
- which mind you i did as a challenge for about an hour, untill it got me.

Thes files were pdf then they some how became word and then someone butchered them into text, and i do have to open it
so if you can do anything very grateful

Thank Leith, hope its not disturbing your good saturday.

Leith Ross
01-14-2017, 09:00 PM
Hello dj,



Thes files were pdf then they some how became word and then someone butchered them into text, and i do have to open it
so if you can do anything very grateful


Do this mean you need the text file to be opened, read, and remain open?

dj44
01-14-2017, 09:07 PM
Hello Leith,

I was just saying originally they were pdf and ended up being text files.

They are just text files now in my folder.

Leith Ross
01-14-2017, 09:13 PM
Hello dj,

Okay, just wanted to be sure.

Leith Ross
01-15-2017, 02:57 PM
Hello DJ,

Sorry about the delay. I encountered a new problem I have never seen before in VBA and it was driving me crazy.

I just discovered that when using a For Each Loop with an array, you cannot exit the loop early and reload the array with new data. VBA moves the entire array into protected memory and locks it. The array will not be unlocked until the For Each Loop has iterated through all of the array elements. In working with VBA for 20 years I have never encountered this until now.

Anyway, that problem has solved and the code is working. It is long because I documented it very completely for you to be able to see and follow the logic.

Here is the macro that has been added to the attached workbook. The attached workbook has a button on the worksheet to run the macro. The macro is setup to allow you to select the folder where the files are. I did this to make it easy for troubleshooting. However, you can easily change the macro to use a fixed folder path. The directions are in the comments.

Macro Code


Sub ParseFiles()

Dim Data() As Byte
Dim File As Variant
Dim Files As Variant
Dim Folder As Object
Dim Line As Variant
Dim Lines As Variant
Dim key As Variant
Dim Keys As Variant
Dim n As Long
Dim Path As Variant
Dim Rng As Range
Dim s As Long
Dim Text As String
Dim Wks As Worksheet
Dim x As Long

' Strings to search for in the text.
Keys = Array("##", "XX", "%%")

' Select Folder path of the text files to be parsed.
With Application.FileDialog(msoFileDialogFolderPicker)
.AllowMultiSelect = False
.Show
If .SelectedItems.Count > 0 Then
Path = .SelectedItems(1)
Else
Exit Sub
End If
End With

' To use a fixed folder path, delete the With ... End With lines above.
' Remove the comment (single quote) from the line below. Change the path to your files.
' Path = "C:\Users\DJ\Desktop\Import\"

Set Wks = ActiveSheet
Set Rng = Wks.Range("C10")

' Clear any previous parsed text.
Intersect(Rng.CurrentRegion, Rng.CurrentRegion.Offset(1, 0)).ClearContents

' Open the folder using it's path.
With CreateObject("Shell.Application")
Set Folder = .Namespace(Path)
End With

' Check that the folder exists.
If Folder Is Nothing Then
MsgBox Path & " Not Found.", vbExclamation
Exit Sub
End If

' Return all files, links, and folders in the folder.
Set Files = Folder.Items

' Filter out only text files.
Files.Filter 64, "*.txt;*.csv"

' Step through each text file in the folder.
For Each File In Files
' Read all of the text into a byte array.
Open File.Path For Binary Access Read As #1
ReDim Data(LOF(1))
Get #1, , Data
Close #1

' Convert the byte array to a text string.
Text = StrConv(Data, vbUnicode)

' Divide the text into individual lines using the carriage return and line feed characters.
Lines = Split(Text, vbCrLf)

' Step through each line of text.
For x = 0 To UBound(Lines)
' Remove any leading or trailing spaces.
Line = Trim(Lines(x))

' Search the line if it is not blank.
If Line <> "" Then
' Check the line for each search term or key.
For Each key In Keys
' Get key's position in the text line.
s = InStr(1, Line, key)

' If the key is found and is not just the key then paste the text after the key.
If s > 0 And s + Len(key) < Len(Line) Then
Rng.Offset(0, n).Value = Mid(Line, s + Len(key), Len(Line) - s - Len(key) + 1)
n = n + 1
End If

' Have all the keys been found? There are only 3 per file.
If n > UBound(Keys) Then GoTo NextFile
Next key
End If
Next x
NextFile:
' Reset the column counter.
n = 0
' Advance to the next row on the worksheet.
Set Rng = Rng.Offset(1, 0)
Next File

End Sub

dj44
01-15-2017, 03:37 PM
Hello Leith,

I can't begin to say what an outstanding generosity you have shown me, and you didn’t have to.

I am very bad at coding to begin with and their is some family emergency and i have do this work and my mind has been everywhere, drowning me.

You have given your time more than anything and that has really touched my heart,
like most folks on here this forum has some of the kindest and most selfless programmers since I joined.

And don’t get me wrong I always throw a spanner in the works because im very naïve and think I can change a few lines - but this is not that.


I always say its not easy coding after hours struggling with it. I cant find no software too that will do this for me.

There is no software that can do this job - so I try to have vba in my life since I learned of its existence.


Thank you so much for this kind gesture - lending your expert coding skills to the newbies and forum communitty, I hope they give great recognition to their members, I always say that.

Please forum I hope you recognise all the good talent and folks here, and sometimes me asking a dumb question
I hope does benefit losts of others who can use this exact thing as well as the pros - who can give kudos and credit for the stellar work shared here.


I send you the best of karma for your kindness and humanity :)


Thank you from your friend

dj

Leith Ross
01-15-2017, 03:59 PM
Hello DJ,

You're welcome. The best thanks come when you have taught someone something new. When you can use that knowledge later on then I have done my job.

bikeboy85
05-12-2018, 12:32 AM
Hello DJ,

You're welcome. The best thanks come when you have taught someone something new. When you can use that knowledge later on then I have done my job.


Leith

Thank you for your code, and coming to the rescue of the OP. I have a similar issue of parsing lines from a text file matching around 34 keywords in my file.

A specific textfile from my folder can be download from this link. https://uploadfiles.io/y9qur

I used the code you provided to the OP with the only modifications being the line on the worksheet where data is placed, the variant KEYS with my own keywords and I also commented out the line 'Intersect(Rng.CurrentRegion, Rng.CurrentRegion.Offset(1, 0)).ClearContents'

Here are the lines I modified :


Keys = Array("Mean RR (ms)", "STD RR (ms)", "Mean HR (beats/min)", "STD HR (beats/min)", "Min HR (beats/min)", "Max HR (beats/min)", "RMSSD (ms)", "NNxx (beats)", "pNNxx (%)", "RR tri index", "TINN (ms)", "VLF (Hz)", "LF (Hz)", "HF (Hz)", "VLF (ms^2)", "LF (ms^2)", "HF (ms^2)", "VLF (log)", "LF (log)", "HF (log)", "VLF (%)", "LF (%)", "HF (%)", "LF (n.u.)", "HF (n.u.)", "Total power (ms^2)", "LF/HF ratio", "SD1 (ms)", "SD2 (ms)", "SD2/SD1 ratio", "Approximate entropy (ApEn)", "Sample entropy (SampEn)", "alpha 1", "alpha 2")




Set Wks = ActiveSheet
Set Rng = Wks.Range("A2")


I assigned the macro to a button on my sheet of interest.

When I run this code, a couple of observations in the parsed output on the excel sheet :

1. Minor issue - Certain data lines are parsed more than once into the column. This is happening exactly for the text lines for "VLF (Hz):" , "VLF (ms^2):", "VLF (log):". I'm guessing with a small tweak this can be fixed.

2. Major issue - The parsing stops after the text line for "SD2/SD1 ratio:" is copied. The last 4 keywords in KEYS is not parsed and the operation moves onto the next file. This happens for all the files. My guess is that the variant KEYS has an array limit. Or is it the memory of my system?

Please help with a tweak because your code does 95% of what I'm looking for... :) :banghead:

Cheers !

MEGAMETRO
04-04-2019, 05:35 PM
Hi there,

Great code above. I'm trying something similar, however my only alteration to the above would be to define the column which each key falls into.

The issue I have is that where some of the text files are missing a 'key' that found in others, the subsequent key is pasted in the 'wrong column' to the others. This makes for messy data from my side.

I've tried to define the keys as separate arrays, but I'm thinking this may not be right.

Can I please reach out and ask if there's an alteration to the code above that would allow the keys to be pasted into specific columns?

Any help would be greatly appreciated!

Thank you,

Luke

Leith Ross
04-05-2019, 07:55 AM
Hello Luke,

I am confident the code can modified but I need a sample of the text and what the final layout should look like.

MEGAMETRO
04-06-2019, 09:58 PM
Leith, thank you so much for offering to help! It's very generous of you!
For whatever reason, I can't upload the .txt files, so I've had to copy/paste and upload as word documents here (upload 1 and 2). These two files are examples where the 'event reason' outcome dictates a change to the recording of some of the headings.
In the case of upload 2, the 'To:' field is not recorded.
The 'From' and 'To' dates are essential data points for hundreds of files.
Where the 'status' ='Discon' a 'To' date is not required.

As mentioned earlier, if I use the code from 01-16-2017 below, this almost does the job. Only when 'Discon' files are in the mix, subsequent 'keys' get muddled up due to the missing 'To' column. If it could be somehow possible to force the keys into specific columns, that would do the trick!

For keys, the following would be useful to capture in specific columns:
A: Notification ID
B: TDD Descriptor
C: TDD number for the listing
D: From
E: To
F: QR Availability
G: Date of publication on the ARCC

Again, thank you for your support here. I noticed that the TRIM function doesn't quite work on these files as the separator between the 'key' and the value is a tab not a space. Anyway, I can look into that another day!

Thank you!
Luke

Leith Ross
04-07-2019, 10:39 AM
Hello Luke,

I downloaded your Word files. It appears each document is a single record. Does each new record begin with a Time Event Notification?

MEGAMETRO
04-07-2019, 01:06 PM
Hi Leith,

correct. Each .txt file is a unique record starting with time event notification.

thanks.
Luke

Leith Ross
04-09-2019, 12:58 PM
Hello Luke,

Sorry for the delay. I decided to change the macro to make it more flexible. You can now choose which column each key will be output to. You can also change delimiter the file uses. Yours is set to a Tab character. Consecutive delimiters are treated as a single delimiter.

The files can contain multiple records as long as each record starts with the same line. The macro lets you select all of the CSV and TXT files in a folder of your choosing. You can change the filter if you need to.

The attached workbook has the macro added and a button to call it. The macros used in the workbook are shown below. If you have any problems or need more help let me know.

Macro Code


Option Explicit


' Thread: http://www.vbaexpress.com/forum/showthread.php?58253-VBA-Extracting-3-Text-Lines-From-Text-files
' Poster: MEGAMETRO (Luke)




Global SearchKeys As Variant
Global strFS As String ' // Field Separator (delimiter string)
Global strNL As String ' // New Line character. PC uses CrLf, Mac uses Lf


Sub ParseRecords(ByRef Records As Variant)

Dim Data As String
Dim Hdr As Range
Dim j As Long
Dim k As Long
Dim Key As Variant
Dim Line As Variant
Dim n As Long
Dim Rec As Variant
Dim Rng As Range
Dim RngEnd As Range
Dim Wks As Worksheet

Set Wks = ThisWorkbook.Worksheets("Sheet1")

Set Hdr = Wks.Range("A1:G1")
Set Rng = Hdr.Offset(1, 0)
Set RngEnd = Wks.Cells(Rows.Count, Hdr.Column).End(xlUp)

If RngEnd.Row > Hdr.Row Then
Set Rng = RngEnd.Offset(1, 0)
End If

If VarType(SearchKeys) = vbEmpty Then
' // SearchKeys: Key = What to look for, Item = Destination column number
Set SearchKeys = CreateObject("Scripting.Dictionary")
With SearchKeys
.CompareMode = vbTextCompare
.Add "Notification ID:", 1
.Add "TDD Descriptor:", 2
.Add "TDD number for this listing:", 3
.Add "From:", 4
.Add "To:", 5
.Add "QR Availability:", 6
.Add "Date of publication on the ARCC:", 7
End With
End If

For Each Rec In Records
For Each Line In Split(Rec, strNL)
' // Skip empty lines
If Line <> "" Then
For Each Key In SearchKeys.Keys
' // Locate the key
n = InStr(1, Line, Key)
' // Extract the first non empty field after the key
If n > 0 Then
j = n + Len(Key)
' // Check for consecutive delimiters after the key
Do
k = (Mid(Line, j, Len(strFS)) = strFS)
If k = 0 Then Exit Do
j = j + Len(strFS)
Loop
' // Check if field data is at the end of the line
If j < Len(Line) Then
' // Check for another field separator after the field data
k = InStr(j, Line, strFS)
If k = 0 Then k = Len(Line)
Data = Mid(Line, j, k - j + Len(strFS))
' // Output the field data with no leading or trailing spaces
Rng.Cells(1, SearchKeys(Key)).Value = Application.Trim(Data)
End If
End If
Next Key
End If
Next Line
' // Move to the next output row
Set Rng = Rng.Offset(1, 0)
Next Rec

End Sub


Sub GetRecords()


Dim Buffer() As Byte
Dim cnt As Long
Dim FileSpec As Variant
Dim n As Long
Dim Recs As Variant
Dim Text As String

' // Initialize the delimiter and new line variables
strFS = vbTab
strNL = vbCrLf

With Application.FileDialog(msoFileDialogOpen)
.AllowMultiSelect = True
.Filters.Add "Text Files", "*.csv,*.txt"
.FilterIndex = .Filters.Count

If .Show = -1 Then
Application.ScreenUpdating = False

For Each FileSpec In .SelectedItems
' // Read the entire file as a byte aray.
Open FileSpec For Binary Access Read As #1
ReDim Buffer(LOF(1))
Get #1, , Buffer
Close #1

' // Convert byte array back to a string
Text = StrConv(Buffer, vbUnicode)

' // Identify each record in the file
Do
' // Starting line of each record
n = InStr(n + 1, Text, "Time event Notification")
If n = 0 Then Exit Do
cnt = cnt + 1
If IsEmpty(Recs) Then
ReDim Recs(1 To 1)
Else
ReDim Preserve Recs(1 To cnt)
End If
' // Save the starting character position of each record
Recs(cnt) = n
Loop

' // Extract each record as a srting
For n = 1 To cnt
If n = cnt Then
Recs(n) = Mid(Text, Recs(n), Len(Text))
Else
Recs(n) = Mid(Text, Recs(n), Recs(n + 1) - 1)
End If
Next n

Call ParseRecords(Recs)
Next FileSpec

Application.ScreenUpdating = True
End If
End With

End Sub

MEGAMETRO
04-09-2019, 03:33 PM
wow.
I'm really amazed. Mostly due to your generosity on taking the time to work on such a bespoke solution to my issue. Thank you so much.

One thing I just want to query, when the single .txt file contains multiple records all starting with the same header, the code works beautifully. If on 'File Open' I select multiple .txt files, as the information is separated into multiple .txt files (I'm selecting just two for now) , I get a run-time error '13': type mismatch on the line beginning 'Recs(n)...


For n = 1 To cnt
If n = cnt Then
Recs(n) = Mid(Text, Recs(n), Len(Text))



Now. I expect this is a newbie error by myself. (I'm still making my way through VBA for dummies!) If so can you please correct me.

Thank you again,

Luke

Leith Ross
04-09-2019, 04:16 PM
Hello Luke,

I assume the format of both files are same. Let me try to reproduce the error.

MEGAMETRO
04-09-2019, 04:27 PM
Yes. I just copied the text from the previously attached word docs, and named them 'test1.txt' and 'test2.txt'. Thanks!

Leith Ross
04-09-2019, 04:28 PM
Hello Luke,

It is nothing to do with you. Its on me. I out clevered myself. Working on fix.

Leith Ross
04-09-2019, 04:43 PM
Hello Luke,

Okay, here is the amended code for the "GetRecords" macro. The attached workbook also has the change.

Change to GetRecords Macro


Sub GetRecords()


Dim Buffer() As Byte
Dim cnt As Long
Dim FileSpec As Variant
Dim n As Long
Dim RecNdx As Variant
Dim Recs As Variant
Dim Text As String

' // Initialize the delimiter and new line variables
strFS = vbTab
strNL = vbCrLf

With Application.FileDialog(msoFileDialogOpen)
.AllowMultiSelect = True
.Filters.Add "Text Files", "*.csv,*.txt"
.FilterIndex = .Filters.Count

If .Show = -1 Then
Application.ScreenUpdating = False

For Each FileSpec In .SelectedItems
' // Read the entire file as a byte aray.
Open FileSpec For Binary Access Read As #1
ReDim Buffer(LOF(1))
Get #1, , Buffer
Close #1

' // Convert byte array back to a string
Text = StrConv(Buffer, vbUnicode)

' // Identify each record in the file
Do
' // Starting line of each record
n = InStr(n + 1, Text, "Time event Notification")
If n = 0 Then Exit Do
cnt = cnt + 1
If IsEmpty(RecNdx) Then
ReDim RecNdx(1 To 1)
Else
ReDim Preserve RecNdx(1 To cnt)
End If
' // Save the starting character position of each record
RecNdx(cnt) = n
Loop

ReDim Recs(1 To cnt)

' // Extract each record as a srting
For n = 1 To cnt
If n = cnt Then
Recs(n) = Mid(Text, RecNdx(n), Len(Text))
Else
Recs(n) = Mid(Text, RecNdx(n), RecNdx(n + 1) - RecNdx(n))
End If
Next n

Call ParseRecords(Recs)
Next FileSpec

Application.ScreenUpdating = True
End If
End With

End Sub

MEGAMETRO
04-09-2019, 05:06 PM
Leith! It works perfectly. A dream!
Again, thank you so much!

A very happy man,
Luke

Leith Ross
04-09-2019, 05:31 PM
Hello Luke,

Thank you for the chance to create something new from something old. Glad I could help.