PDA

View Full Version : macro to extract specific data from text file to excel sheet



learn72
02-11-2018, 07:02 AM
I have a pdf file which has been converted to txt .

My issue is to extract specific information as per attached which are numbered from 25 to 42. It consists to extract numbered headers with corresponding data .

I don't know if this achievable with vba code

note the text file was unable to load , rename as csv , thus rename as txt before opening

Leith Ross
02-11-2018, 05:04 PM
Hello learn72,

I do not see any headers that match in these two files. Did you post the correct files?

p45cal
02-11-2018, 07:48 PM
This is a horrible task! Surely you can get the data another way?
In the attached there is a flaky solution; loads of assumptions are made, a few guesses too. The likelihood that it will go wrong and/or give wrong results is high.
I'm not even sure it's giving you what you need.
Anyway, I've only done some very cursory checking - I leave that to you.
On Sheet2 of the attached is a button in the vicinity of cell C8 for you to click. Initially, select the same .txt file you renamed .csv and attached here, in order to duplicate what I was working with and hopefully it won't fall over. It places the results starting at cell A10 of the same sheet.
Sheet5 of the attached is the output I got using the text file you attached.

The code's in the file but here it is too:
Sub blah()
Set Destn = Sheets("Sheet2").Range("A10")
fpath = Application.GetOpenFilename("Text Files (*.txt),*.txt")
If Not fpath = False Then
zzz = CreateObject("scripting.filesystemobject").opentextfile(fpath).readall
'zzz = CreateObject("scripting.filesystemobject").opentextfile("C:\Users\mmmmmm\AppData\Local\Temp\data.txt").readall
yyy = Split(zzz, vbCrLf & vbCrLf & "25")
For Each rcd In yyy
If InStr(rcd, "Marks & Numbers") > 0 Then
'Stop
*** = Split(rcd, vbCrLf & vbCrLf)
For i = LBound(***) To UBound(***)
Select Case True
Case InStr(***(i), "Number and Kind") > 0
www = ShortArray(Split(***(i + 1), " "))
B = www(0)
Case InStr(***(i), "34 FOB Ncy") > 0
www = ShortArray(Split(***(i + 1), " "))
L = www(1)
M = Application.Trim(Split(www(2), "|")(0))
N = Application.Trim(Split(www(3), "|")(0))
Case InStr(***(i), "37 Other Costs") > 0
www = ShortArray(Split(***(i + 1), " "))
O = Application.Trim(Split(www(0), "|")(0))
On Error Resume Next
P = www(2)
Q = www(3)
On Error GoTo 0
Case InStr(***(i), "Description of goods") > 0
www = ShortArray(Split(***(i + 1), " "))
C = www(0)
Case InStr(***(i), "31 Gross mass") > 0
www = ShortArray(Split(***(i + 1), " "))
ColmI = www(0)
J = www(1)
K = Application.Trim(Split(www(2), "|")(0))
Case InStr(***(i), "28 Cty. Org") > 0
www = ShortArray(Split(***(i + 1), " "))
F = www(1)
G = www(2)
H = www(3)
Case InStr(***(i), "Marks & Numbers") > 0
www = ShortArray(Split(***(i + 1), " "))
D = www(1)
' Debug.Assert D <> "098"
E = www(2)
Case InStr(***(i), "40 Tax") > 0
www = ShortArray(Split(***(i + 1), " "))
X = www(UBound(www))
'Stop
WriteToSheet Destn, Array(A, B, C, D, E, F, G, H, ColmI, J, K, L, M, N, O, P, Q, , , , , , , X)
ii = i + 1
Do
R = Empty: S = Empty: T = Empty: U = Empty: V = Empty: W = Empty:
ccc = ShortArray(Split(***(ii), " "))
If UBound(ccc) > 2 Then
'Stop
R = ccc(0)
S = ccc(1)
T = ccc(2)
U = ccc(3)
V = ccc(4)
W = ccc(5)
WriteToSheet Destn.Offset(, 17), Array(R, S, T, U, V, W)
Set Destn = Destn.Offset(1)
End If
ii = ii + 1
Loop Until InStr(***(ii), "Total") > 0 Or UBound(ccc) < 3
End Select
'www = Split(thing, " ")
Next i
End If
Next rcd
End If
End Sub

Function ShortArray(myArr)
ReDim NewArr(LBound(myArr) To UBound(myArr))
J = 0
For i = LBound(myArr) To UBound(myArr)
Z = Application.Trim(myArr(i))
If Z <> "" Then
NewArr(J) = Z
J = J + 1
End If
Next i
ReDim Preserve NewArr(LBound(myArr) To J - 1)
ShortArray = NewArr
End Function

Sub WriteToSheet(Dest, myArr)
Dest.Resize(, UBound(myArr) + 1).Value = myArr
End Sub


I note there are instances of *** in the code above, that's where I had three x's!

There are things to do still; eg. I haven't cleared the variables for each loop iteration (the variable names are the column letters of their destination, except for ColmI because I used i for something else) and this could lead to wrong data/results.
Actually, after posting I found it does produce wrong data in colums P and Q starting at item no. 99. I'll post an update tomorrow - too tired now.

p45cal
02-12-2018, 06:28 AM
Actually, after posting I found it does produce wrong data in colums P and Q starting at item no. 99. I'll post an update tomorrow - too tired now.Update attached.

learn72
02-13-2018, 12:57 AM
Thank you a lot p45cal for the time dedicated for this,it is not an easy task but very tough , I never thought it would be possible. The original file is a pdf file .

I will make some test ,if any issues I will revert back

Otherwise can you suggest is there is another technique to extract data from a pdf file .

p45cal
02-14-2018, 04:47 AM
Otherwise can you suggest is there is another technique to extract data from a pdf file .Where has the pdf file come from? How has it been made?

learn72
02-14-2018, 08:01 AM
I obtained the pdf file from a system but can't do much more .

Sent message , Hope it helps

p45cal
02-14-2018, 08:54 AM
A bit surprised there is the facility to produce a pdf file as that's quite a complex thing.
Is there really no other output format to choose from?
What is the "system"?

learn72
02-18-2018, 08:15 AM
Apologize for late reply, the system was developed in Singapore an EDI System, but so far I am trying to find a solution as it is very limited to extract a complete report from it .

p45cal
02-18-2018, 01:12 PM
the system was developed in Singapore an EDI System
This is looking hopeful; an EDI document has structure to it, which will be consistent throughout the document.
I don't know what type your EDI system is but almost anything that it produces is likely to be far better structured than a PDF file.
An EDI system is designed to produce files for computers to read.
There are several responses to very similar questions to yours on the internet that we might bring to bear. We just need some more specifics from you. There's even one showing how to get stuff into a csv file, which is good for Excel.

There are several options and protocols; some will be easier to get into Excel than others. Have a look at this page: https://en.wikipedia.org/wiki/Electronic_data_interchange
about three-quarters of the way down (under See also). Do you recognise any that are in your system?

Almost anything would be bettter than a pdf file.

An interesting youtube video here: https://www.youtube.com/watch?v=oWkzeP2LP-8