Felix Atagong
10-15-2007, 01:59 AM
I found a quick and dirty way to parse (simple) text-based Adobe Acrobat pdf files into Excel using a freeware pdf2txt utility. It can be useful when you always have the same pdf file or report that needs to be parsed.
What I do is saving the pdf file into My Documents and then run a bat file.
Sub OpenPDF()
'-------------------------------------------'
' BAT FILE: pdftotext.exe -layout YourPage.pdf
' DOWNLOAD LINK: http://www.foolabs.com/xpdf/download.html
'-------------------------------------------'
' these lines look for a pdf file in your My Documents folder
Set WshShell = CreateObject("WScript.Shell")
ChDir (WshShell.SpecialFolders("MyDocuments"))
PageName = Application.GetOpenFilename("YourPage, *.pdf", , "YourPage")
' if no file is picked the macro ends
If PageName = "False" Then
Exit Sub
End If
' copies and renames the pdf file to the pdf2txt folder
FileCopy PageName, "C:\pdf2txt\YourPage.pdf"
ChDir ("C:\pdf2txt")
' THE BATFILE CONTAINS ONLY 1 LINE:
' pdftotext.exe -layout YourPage.pdf
TestValue = Shell("YourPage.bat", 1)
' because the bat file runs for 1 or 2 seconds (in my case)
' I let the Excel macro wait 5 seconds before doing anything else
' there are more ingenious ways for VBA to wait for the end of an
' application, but this suits me fine...
Application.Wait (Now + TimeValue("0:00:05"))
ChDir "C:\pdf2txt"
PageName = "C:\pdf2txt\YourPage.txt"
' the following reads the text that has been generated
Call ReadTextFile
' do your text parsing - text to columns etc... etc... hereafter
End Sub
Sub ReadTextFile()
Dim FileNum As Integer
Dim r As Integer
Dim wb As Workbook
Dim Data As String
r = 1
FileNum = FreeFile
Set wb = Workbooks.Add
Open BladNaam For Input As #FileNum
Do While Not EOF(FileNum)
Line Input #FileNum, Data
ActiveSheet.Cells(r, 1) = Data
r = r + 1
Loop
Close #FileNum
End Sub
There are more beautiful ways to make this work, but like I said I needed a quick and dirty solution. Hope this might help somebody.
Felix
What I do is saving the pdf file into My Documents and then run a bat file.
Sub OpenPDF()
'-------------------------------------------'
' BAT FILE: pdftotext.exe -layout YourPage.pdf
' DOWNLOAD LINK: http://www.foolabs.com/xpdf/download.html
'-------------------------------------------'
' these lines look for a pdf file in your My Documents folder
Set WshShell = CreateObject("WScript.Shell")
ChDir (WshShell.SpecialFolders("MyDocuments"))
PageName = Application.GetOpenFilename("YourPage, *.pdf", , "YourPage")
' if no file is picked the macro ends
If PageName = "False" Then
Exit Sub
End If
' copies and renames the pdf file to the pdf2txt folder
FileCopy PageName, "C:\pdf2txt\YourPage.pdf"
ChDir ("C:\pdf2txt")
' THE BATFILE CONTAINS ONLY 1 LINE:
' pdftotext.exe -layout YourPage.pdf
TestValue = Shell("YourPage.bat", 1)
' because the bat file runs for 1 or 2 seconds (in my case)
' I let the Excel macro wait 5 seconds before doing anything else
' there are more ingenious ways for VBA to wait for the end of an
' application, but this suits me fine...
Application.Wait (Now + TimeValue("0:00:05"))
ChDir "C:\pdf2txt"
PageName = "C:\pdf2txt\YourPage.txt"
' the following reads the text that has been generated
Call ReadTextFile
' do your text parsing - text to columns etc... etc... hereafter
End Sub
Sub ReadTextFile()
Dim FileNum As Integer
Dim r As Integer
Dim wb As Workbook
Dim Data As String
r = 1
FileNum = FreeFile
Set wb = Workbooks.Add
Open BladNaam For Input As #FileNum
Do While Not EOF(FileNum)
Line Input #FileNum, Data
ActiveSheet.Cells(r, 1) = Data
r = r + 1
Loop
Close #FileNum
End Sub
There are more beautiful ways to make this work, but like I said I needed a quick and dirty solution. Hope this might help somebody.
Felix