Consulting

Results 1 to 8 of 8

Thread: extract data from pdf to excel

  1. #1

    extract data from pdf to excel

    Hi All,

    I need some directions for finding solution to the below problem. As I know its not going to be easy thing so I need help to know the different approaches that can be tried.

    Problem: I have thousands of pdf files. Each file has specific format as below:

    Name:XXXXX DOB:XXXXXXX
    ID:XXXXXXX

    Dependent Name:XXXXXX
    DOB:XXXXXXX


    I have to extract this specific data to excel before updating them into a software.

    Are there any ways to extract specific data from a pdf to excel sheets in a better way than going through each pdf. I don't know how OCR works but are they helpful. I just came across it in google.

    Kindly provide your suggestions..

    Thanks for your help.

  2. #2
    VBAX Regular
    Joined
    May 2013
    Posts
    34
    Location
    I'm not sure what the pros here might say, but when I have to pull from pdf reports and such, I've used the features of a pdf reader to extract the files. I used Nitro pdf once which was able to pull out data into excel surprisingly well. One useful feature was that I was able to do it from the file menu, so I could select a lot of different files and extract from there, I believe. Big downside is you have to pay for that program.

    I'd be interested in hearing what the others have to say, though. Good question!

  3. #3
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    Depends on the type of PDF file. Attach an example. Obfuscate the data if needed.

  4. #4
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    You might try this freebie:

    http://www.generalfreeware.com/freew...file-17264.htm

    They say it will do batch conversion of many PDF's into one text file.

    Once you have the PDF's converted to one or more text files. VBA can import them into Excel.

    When you've tried it and decided, show us a few lines of the output text.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  5. #5
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    You can probably do it if you have Adobe Acrobat, not Adobe Reader. To reference the object, see this example. http://www.vbaexpress.com/forum/showthread.php?t=40734

    For another 3rd party converter, I found this one: http://www.sejda.org/shell-interface/tutorial/

    Obviously, you can Shell() to a 3rd party console program. Here is an example that I posted for pdfsam which is similar to sejda. http://vbaexpress.com/forum/showthread.php?p=180767

    I could probably do it with iTextSharp in vb.net but that might be too involved for you.

  6. #6
    Thanks for your suggestions. Will check these options and post my feedback... Thanks again...

  7. #7
    VBAX Newbie
    Joined
    Sep 2015
    Posts
    3
    Location
    I wonder whether there are any 3rd party toolkits whose way of processing is simple and fast to help with that?
    Best Regards,
    Pan

    I am testing about PDF extraction sdks to extract text from pdf files, any ideas?


    Next Tomorrow is Another Day.


  8. #8
    Moderator VBAX Master Tommy's Avatar
    Joined
    May 2004
    Location
    Houston, TX
    Posts
    1,184
    Location
    You can save the pdf file as an Excel file. Use the axAcroPDF.dll to access the pdf and save the file via VBA. The execute command is what you are looking for, the item to execute would be the menu item for saveas excel workbook. You will need to do some research for the internal menu item name you are looking for.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •