Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 23

Thread: Solved: Extract Data From PDF Files to Excel

  1. #1
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location

    Solved: Extract Data From PDF Files to Excel

    Hello To All,

    I was wondering given all the amazing things people achieve with VBA, is it possible to extract data / strings from PDF files using VBA?

    If the answer is yes.... would someone please give me some insight as to how this is done.

    Thank you,

    Carpiem

  2. #2
    VBAX Tutor
    Joined
    Mar 2005
    Posts
    268
    Location
    Welcome to VBAX, Carpiem.

    The good news is that yes, it is possible. The bad news is that to do it in VBA would probably (to the best of my knowledge) require extensive understanding of PostScript. But if you search around on the web, you'll probably be able to find programs for converting PDF to Word, plain text, HTML, and various other formats. Some of these programs are free and maybe they'll give you a good starting point.

    Now, if the formats of your PDF documents all follow a similar design, you *might* be able to use VBA to extract the info you need with minimum or even no knowledge of PostScript. With even minor differences between files, things are going to get complicated.

  3. #3
    VBAX Contributor Ivan F Moala's Avatar
    Joined
    May 2004
    Location
    Auckland New Zealand
    Posts
    185
    Location
    If You use XL2003, then yes and you can do this via automation and MDI
    But the process would require taking an image snap shot of the PDF file.
    Kind Regards,
    Ivan F Moala From the City of Sails

  4. #4
    Moderator VBAX Mentor sheeeng's Avatar
    Joined
    May 2005
    Location
    Kuala Lumpur
    Posts
    392
    Location
    Quote Originally Posted by Ivan F Moala
    If You use XL2003, then yes and you can do this via automation and MDI
    But the process would require taking an image snap shot of the PDF file.
    Can you explain it step by step if you have time? I'm looking forward to try it.
    Thanks in advence.

  5. #5
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    BlueCactus,

    Thank you for the precis. I actually use an excellent program (Read Iris) to convert PDF to Excel. As with most things though there is cleaning up to do after the conversion.

    It seems I might better ask a few questions to do with cleaning up the file rather than approaching the exercise from an admittedly intriguing and new process to me.

    Carpiem
    Quote Originally Posted by BlueCactus
    Welcome to VBAX, Carpiem.

    The good news is that yes, it is possible. The bad news is that to do it in VBA would probably (to the best of my knowledge) require extensive understanding of PostScript. But if you search around on the web, you'll probably be able to find programs for converting PDF to Word, plain text, HTML, and various other formats. Some of these programs are free and maybe they'll give you a good starting point.

    Now, if the formats of your PDF documents all follow a similar design, you *might* be able to use VBA to extract the info you need with minimum or even no knowledge of PostScript. With even minor differences between files, things are going to get complicated.

  6. #6
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Mr. Moala,

    Never thought I would have the pleasure of responding to you. Have learned a good deal from you in the past and daresay if I was up to it would learn much more.

    Confess to knowing zero about Multiple Document Interface... the program I dowloaded needed at least minimal understanding of the process to be invoked.

    I shall plod along and try to refine my needs in a more subdued manner.

    Thank you,

    Carpiem
    Quote Originally Posted by Ivan F Moala
    If You use XL2003, then yes and you can do this via automation and MDI
    But the process would require taking an image snap shot of the PDF file.

  7. #7
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Carpiem,
    Can you attach a sample of an imported spreadsheet, which might give an idea of the "clean up" required. You can attach a file by zipping it and using the Manage Attachments button found in the Go Advanced option.
    Regards
    MD
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  8. #8
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Good Evening MD,

    Have attached the much shortened file for your scrutiny. My explanation may not be to clear... have to wait and see.

    I do really appreciate you taking time out to look at this.

    Regards,

    Carpiem

    Hi Carpiem,
    Can you attach a sample of an imported spreadsheet, which might give an idea of the "clean up" required. You can attach a file by zipping it and using the Manage Attachments button found in the Go Advanced option.
    Regards
    MD

  9. #9
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Carpiem,
    I'm not too clear on obtaining the results you show in G38. Can you explain why this should be 120. If this is correct, the code can be modified as required. The split is done in the Sample columns by two simple UDF (user defined formulae) in Module 1, which can be manipulated to suit.
    Regards
    MD
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  10. #10
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Hello MD,

    The UDF's you created are just what "The Doctor Ordered" and much appreciated.

    I can't explain what I was thinking when I inserted column G.... hence no idea what to say. In short order though I will be using your functions and adding to them.

    On the subject of UDF's how does one incorporate them into vba. In this instance I am referring to the possibility of having the UDF's automatically do their job when the "Quoted Cost" (E38) was entered or pasted in?

    I do not wish to waste your time and am only exploring the possibility of further automating the order entry process.

    Regards,

    Carpiem

  11. #11
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Carpiem,
    There's no question of "wasting time", that is just what we do here.
    One method as follows. Paste the code into the worksheet module, edit the intersect range and adjust the offset values etc. as appropriate. If you don't need the formulae, and only the result, then this could be extracted and pasted in as values. It all depends on your purposes.
    Regards
    MD

    [VBA]
    Private Sub Worksheet_Change(ByVal Target As Range)
    On Error GoTo LastLine
    If Not Intersect(Target, Range("E38:E45")) Is Nothing Then
    Target.Offset(0, 3).Formula = "=UOM(E" & Target.Row() & ")"
    Target.Offset(0, 4).Formula = "=UNIT(E" & Target.Row() & ")"
    End If
    LastLine:
    End Sub

    [/VBA]
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  12. #12
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Hello MD,

    Seems everyone is exceedingly helpful here, which is great given the alternative. So once again 'Thank you".

    Looking at your code and given that the range could be anything from a few rows to hundreds, would it be sufficient for me to use a named range to cater for additions or deletions to the rows and still incorporate your code as is?

    Regards,

    Carpiem

  13. #13
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Carpiem,
    A named range would work fine. The only reason for the range is to exclude cells where you would not wish the code to run. eg, titles, totals or whatever. You could also get the code to run only if the text contains the / character. It's all depends on your conversions.
    Regards
    MD
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  14. #14
    Knowledge Base Approver VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    FWIW, I once did something similar from outlook whereby I sent the pdf to an adobe email address that converted the pdf to text, then I scanned for a reply email from this adress, then used a RegExp to extract the portion of data from the text.

  15. #15
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Hello MD,

    Well that is clear enough. Thank you again.

    Regards,

    Carpiem
    Quote Originally Posted by mdmackillop
    Hi Carpiem,
    A named range would work fine. The only reason for the range is to exclude cells where you would not wish the code to run. eg, titles, totals or whatever. You could also get the code to run only if the text contains the / character. It's all depends on your conversions.
    Regards
    MD

  16. #16
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Hello BrettDJ,

    I understand what you are saying.... the problem is that I have never heard of an Adobe E-Mail address before and searching on the web has not revealed anything relevant.

    Could you please elaborate on the procedure you used a little more as it sounds like something I should explore.

    Regards,

    Carpiem

    Quote Originally Posted by brettdj
    FWIW, I once did something similar from outlook whereby I sent the pdf to an adobe email address that converted the pdf to text, then I scanned for a reply email from this adress, then used a RegExp to extract the portion of data from the text.

  17. #17
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,453
    Location
    Quote Originally Posted by Carpiem
    Hello BrettDJ,

    I understand what you are saying.... the problem is that I have never heard of an Adobe E-Mail address before and searching on the web has not revealed anything relevant.

    Could you please elaborate on the procedure you used a little more as it sounds like something I should explore.
    Perhaps this is what Brett was referring to http://www.adobe.com/products/acroba...linetools.html
    ____________________________________________
    Nihil simul inventum est et perfectum

    Abusus non tollit usum

    Last night I dreamed of a small consolation enjoyed only by the blind: Nobody knows the trouble I've not seen!
    James Thurber

  18. #18
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Hola XLD,

    Gracias por dirigirme a ese sitio.

    ?Eso es un Saxophone muy fresco tambi?n!

    Respeto,

    Carpiem

  19. #19
    VBAX Regular
    Joined
    Apr 2005
    Posts
    53
    Location
    Good Evening MD,

    I have more or less got my head around the way you constructed the UDF's and the vba to automate them.

    Would you be willing to walk me through the next stage i.e. "You could also get the code to run only if the text contains the / character." .... ?

    Regards,

    Carpiem

    Quote Originally Posted by mdmackillop
    Hi Carpiem,
    A named range would work fine. The only reason for the range is to exclude cells where you would not wish the code to run. eg, titles, totals or whatever. You could also get the code to run only if the text contains the / character. It's all depends on your conversions.
    Regards
    MD

  20. #20
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Good evening Carpiem,
    A simple modification of the WorkSheet sub can test for / in a string, if is found, the split takes place, if not, you can exit the sub, or as in this example, clear the UOM and Unit cells.
    [VBA]
    Private Sub Worksheet_Change(ByVal Target As Range)
    On Error GoTo LastLine
    If Not Intersect(Target, Range("E38:E45")) Is Nothing Then
    If InStr(1, Target, "/") = 0 Then
    Target.Offset(0, 3).ClearContents
    Target.Offset(0, 4).ClearContents
    Else
    Target.Offset(0, 3).Formula = "=UOM(E" & Target.Row() & ")"
    Target.Offset(0, 4).Formula = "=UNIT(E" & Target.Row() & ")"
    End If
    End If
    LastLine:
    End Sub
    [/VBA]
    Regards
    MD
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •