PDA

View Full Version : How to use Word 2010 macro to read a PDF?



BruceA@WSIB
11-28-2012, 11:58 AM
Hi Guys,

PDF is currently unsaved - it is in memory only. Perhaps I should save it as a temp file. It might be easier to load that way - not sure.

My first idea is to reference acrobat.tlb, and use the object model to get the data from the PDF.

What are the minimum licence requirements to install acrobat.tlb on a client computer? Can it be rolled out for free with Acrobat Reader? (I doubt it.)

To get acrobat.tlb working properly within my Word 2010 VBA macro, I had to uninstall Acrobat Reader, Air, and Shockwave, reboot, and then install a trial version of Adobe Acrobat Pro XI, making a reference to acrobat.tlb.

Source code is always appreciated!

(The licence costs for the Acrobat Pro XI solution might be too expensive. Ideas for alternative solutions to Acrobat Pro XI??? Anyone tried the Universal Document Converter API at http://www.print-driver.com. Licence costs may be much less.)

~

macropod
11-29-2012, 02:15 AM
I think that, if you read the Acrobat Pro licence, you'll find you cannot legally install any component on a computer to which the license does not apply.

With a reference to the Adobe # Type Library, the following function can be used to read the contents of an Adobe PDF file:
Public Function ReadAcrobatDocument(strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next i
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
You can then call the function with code like:
Sub Demo()
Dim strPDF As String, strTmp As String, i As Integer
' The next ten lines and the last line in this sub can help if
' you get "ActiveX component can't create object" errors even
' though a Reference to Acrobat is set in Tools|References.
Dim bTask As Boolean
bTask = True
If Tasks.Exists(Name:="Adobe Acrobat Professional") = False Then
bTask = False
Dim AdobePath As String, WshShell As Object
Set WshShell = CreateObject("Wscript.shell")
AdobePath = WshShell.RegRead("HKEY_CLASSES_ROOT\acrobat\shell\open\command\")
AdobePath = Trim(Left(AdobePath, InStr(AdobePath, "/") - 1))
Shell AdobePath, vbHide
End If
strPDF = ReadAcrobatDocument("C:\Users\" & Environ("UserName") & "\Documents\MyFile.pdf")
ActiveDocument.Range.InsertAfter strPDF
If bTask = False Then Tasks.Item("Adobe Acrobat Professional").Close
End Sub

BruceA@WSIB
11-30-2012, 11:47 AM
Hey Macropod, props for your thorough answer.

Upon further review of Adobe's products and their licences, this is what I've found.

Acrobat Reader doesn't contain the important acrobat.tlb file I'm after.

Acrobat Pro does contain acrobat.tlb.

Acrobat SDK also contains acrobat.tlb, and is free to download.

I called Adobe Support, and they weren't of much use. They couldn't tell me much.

Can I roll out (the SDK version of) acrobat.tlb with my solution for free, using the free SDK?

I work for a large institution, so we want to do all our licencing by the books, but also as inexpensively as possible.

Your thoughts?

~

Frosty
11-30-2012, 12:23 PM
My company has found institutional licensing for adobe productions to be prohibitive. You might check out Nuance/ScanSoft, which has OCR functionality built in as well (that is what we moved to a number of years ago, when Adobe Pro became too expensive). I'm not sure Nuance has all the APIs that Adobe has, however (I haven't researched it).

In terms of license compliance, if Adobe can't help you... I'm not sure I'd necessarily rely on the opinion from any post in a forum (even one as good as this, and even from a source as good as Paul).

macropod
11-30-2012, 02:40 PM
Acrobat SDK also contains acrobat.tlb, and is free to download.

....

Can I roll out (the SDK version of) acrobat.tlb with my solution for free, using the free SDK?
My advice would be to read the SDK licence. I don't know what provisions/restrcitions it has.