PDA

View Full Version : read contents of pdf



msahmed
10-11-2016, 12:56 PM
hi there,

I am in need of a code to fetch the email ids in mailto: format from pdf files in a particular folder.

The email id's are always situated next to the tenant name. All the pdfs are in readable format and I have Acrobat installed.

Thanks in advance...

Kenneth Hobs
10-11-2016, 01:34 PM
When you say Acrobat, you mean the full version and not just the reader, right?

If so:

'http://www.vbaexpress.com/forum/showthread.php?57409-read-contents-of-pdf
Sub Test_ReadAcrobatDocument()
Debug.Print ReadAcrobatDocument(ThisWorkbook.Path & "\P1.pdf")
End Sub


'http://www.eileenslounge.com/viewtopic.php?f=30&t=5907
'Add reference: Acrobat
Public Function ReadAcrobatDocument(strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j

Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")

If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next i

ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function