PDA

View Full Version : Extract data from PDF



Jeevie
02-12-2011, 08:11 AM
Hi

Can someone share code sample to extract data from text based pdf to Excel.

Thanks in advance

Simon Lloyd
02-12-2011, 09:03 AM
You would need some software to convert the pdf back to a readable format, once you do that how will you be able to target which portion of the pdf has the data you need?

Bob Phillips
02-13-2011, 03:15 AM
There is product called PDFExtract that creates Excel or Word documents rom PDF.

Jeevie
02-13-2011, 07:03 AM
Hi

I know of that as well as other products like Able2Extract but I was wondering if we can do that using Acrobat APIs and VBA.

Thanks

Bob Phillips
02-13-2011, 07:45 AM
Why reinvent the wheel?

Jeevie
02-13-2011, 10:12 AM
Hi

Extracting the data would be part of the solution and there would be further procesing based on requirements. If it was possible to extract using VBA, that would make it integrated rather than having separate solutions to extract the data and then process it.

Thanks

Kenneth Hobs
02-13-2011, 10:33 AM
In any solution, you will have to extract data and then process it.

A .net project that includes the iTextSharp.dll could probably extract the text. I have worked with the iTextSharp.dll to some extent.

Another method that might be easily used is pdftk. You could use Shell() to shell to it and pass command line parameters and values. A ShellWait() routine may be needed to allow it time to complete the process. http://www.pdflabs.com