PDA

View Full Version : Acrobat 8 - Export to Text



stanl
11-06-2008, 06:42 AM
We are experimenting with eFax - where workorders are faxed and arrive as PDF's rather than being mailed, then scanned to PDF.

Often several workorders are faxed in batch - each can be 2-3 pages which includes the workorder, signed contract, and perhaps an additional warranty

We set up an 800 number which routes to an email address and I have no problem opening the emails, extracting the headers/body into fields in an access table, then using an ADO Stream to place the PDF as a binary object in another field for later export/viewing.

If I open an exported PDF in Acrobat 8 - there is an option to export to text, and even though the entire pdf is an image, the OCR functionality of Acrobat gives me enough to interpret the page # and workorder # so I can parse that data as addtional field info.

I have used VBA with Acrobat up to version 6.0, so I can code the creation of and loading of the pdf

'Initialize Acrobat by creating App object
Set gApp = CreateObject("AcroExch.App")
gApp.Hide

'Set AVDoc object
Set gAvDoc = CreateObject("AcroExch.AVDoc")


I am assuming I need

gApp.MenuItemExecute("[Something]")

but I can't figure out what [Something] is. Sure would appeciate a code snippet to get any readable text to a variable.

TIA Stan

jfournier
11-06-2008, 09:10 AM
I haven't used Acrobat from VBA, but if you can get an image file of your PDF you may be able to use Microsoft Office Document Imaging (MODI) to OCR your text...not sure if it's as good as Acrobat's though...

This is a function I use to OCR an image file...

Function GetOCRText(TheFile As String) As String
On Error GoTo PROC_ERR

If TheFile = "" Then Exit Function

Dim MyDoc As Object ' MODI.document
Dim MyLayout As Object ' MODI.Layout

Set MyDoc = CreateObject("MODI.document") ' New MODI.document
MyDoc.create TheFile
MyDoc.images(0).OCR
Set MyLayout = MyDoc.images(0).Layout

For Each TheWord In MyLayout.Words
Result = Result & " " & TheWord.Text
Next TheWord
Result = Result & vbCrLf & vbCrLf

GetOCRText = Result

Set MyLayout = Nothing
MyDoc.Close False
Set MyDoc = Nothing

PROC_ERR:

End Function

stanl
11-06-2008, 11:14 AM
Ah! and - therein lies the rub. I tried this in Vista w/Office 2007 installed and there is no MODI - does it come as a separate download?

jfournier
11-06-2008, 11:20 AM
MODI should come with all office versions, you may need to configure it

This is how to check in Office 2003, hopefully it's similar in 2007:

Go to Add/Remove Programs, Microsoft Office 200x, and click change. Then select "Add/Remove Features", and make sure "Choose advanced customization of applications" is checked on the list of Office utilities. YOu'll then be presented a treeview of different items, and MODI is under "Office Tools"

stanl
11-07-2008, 12:02 PM
:friends: Thanks; works great, but only accepts .tif or .mdi. I think we can change the efax setting to attach as .tif and this would be an easy solution.

Stan