PDA

View Full Version : Search for postcodes in a PDF document and list them in a excel worksheet



krishbhargav
12-09-2016, 03:01 PM
Hi,

I have been working on a code to loop through every string in every pdf file in a folder and look for Postcodes that could be listed in the file and taking into consideration that the postcodes could be in any of the below format. Once we find all of the strings in the below format then the strings needs to be copied into a excel document. I have hit a brickwall with this and I m yet to find a solution for this. I have looked for help on the internet and not much help out there. Could some one kindly help me with the code please? Really appreciate your patience.

Format Example

AN NAA M1 1AA
ANN NAA M60 1NW
AAN NAA CR2 6XH
AANN NAA DN55 1PT
ANA NAA W1A 1HQ
AANA NAA EC1A 1BB

Regards,

Krish

Fennek
12-10-2016, 09:18 AM
Hi,

can you convert PDF to a txt-file? If yes, your problem could be solved with "Regular Expression".

regards

krishbhargav
12-10-2016, 09:40 AM
Hi,

can you convert PDF to a txt-file? If yes, your problem could be solved with "Regular Expression".

regards

Hi fennek, sorry we not in a position to convert the file as this would be used by end users Who will use the tool to extract postcodes in the file.

Regards
krish

Kenneth Hobs
12-10-2016, 10:22 AM
Krish, if your macro does not get the content of a pdf file, how do you expect it to get "postcodes"? A text file would have provided text for testing purposes. Getting pdf content alone can be quite a chore if you don't have Acrobat. Obviously, if you use Acrobat to do that, your users MUST have Acrobat.

Fennek
12-10-2016, 10:41 AM
Hi,

I use Firefox, so I am not experienced with MS Edge. But a short test showed
- strg-a didn't work
- selecting with the mouse and strg-c and in the editor strg-v worked

If you can write a xl-vba code to open the pdf in edge (or IE) and transfer all txt either into a file or better, in an array, the search-code could start.

mfg

Fennek
12-10-2016, 01:00 PM
Hi,

if you have the data in a txt-file "c:\temp\", then this code should help:



sub Fen()
iPath = "c:\temp\"
iFile = "Khrish express.txt"
With CreateObject("scripting.filesystemobject")
'Fn = Split(.opentextfile(iPath & iFile).readall, vbCrLf)
Fn = .opentextfile(iPath & iFile).readall
End With
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.Pattern = "\b\w{2,4}\s\w{3}\s\w.{1,3}\s\d.{2}$"
Set RR = .Execute(Fn)
Debug.Print RR.Count
For i = 0 To RR.Count - 1
Debug.Print RR(i)
Next i
End With
End Sub


regards