Consulting

Results 1 to 6 of 6

Thread: Save embedded PDF file as a separate PDf file

  1. #1
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location

    Save embedded PDF file as a separate PDf file

    I am developing a system where the client wants to embed pdf documents into a worksheet. Linking in this case is not applicable because the worksheet will be emailed to another system. I want to write macro to extract the embedded pdf objects and save them as separate files. Does anybody know of a way of doing this?

  2. #2
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Needs a bit of error handling and some tidying but give this a try, as posted here

    Another approach would be to try to extract the files from the .xslx file - this can be done quite easily, as it's essentially just a .zip file. To play around with this manually, rename your .xlsx file to .zip and extract it. Inside, you will find a folder like "xl", and inside there, "embeddings". Your PDF document will be here, unfortunately encoded as an OLE Object. You can then try some get the data out of it, one example is here.
    Edit: I was revisiting this on another PC where the PDF failed to open. I changed the viewer to Microsoft Edge which does allow the PDF to be viewed.I guess that's to do with the OLE coding referred to above.

    Dim sPath As String
    
    Sub ExtractPDF()
    Dim fname As Variant
    Dim Pth As String
    
    sPath = "C:\VBAX\"
    
    fname = sPath & "Data.zip"
    Pth = sPath & "MyUnzipFolder\xl\embeddings\"
    MkDir sPath & "PDF"
    ActiveWorkbook.SaveCopyAs fname
    Unzip1 fname
    
    f = Dir(Pth & "*.bin")
    Do Until f = ""
    i = i + 1
    Name Pth & f As sPath & "PDF\" & "PDFfile" & Format(i, "00") & ".pdf"
    f = Dir
    Loop
    Delete_Whole_Folder (sPath & "MyUnzipFolder")
    retVal = Shell("explorer.exe " & sPath & "PDF", vbNormalFocus)
    
    End Sub
    
    
    Sub Unzip1(fname As Variant)
        'https://www.rondebruin.nl/win/s7/win002.htm
        Dim FSO As Object
        Dim oApp As Object
       
        Dim FileNameFolder As Variant
        Dim DefPath As String
        Dim strDate As String
    
            DefPath = sPath
    
            'Create the folder name
            FileNameFolder = DefPath & "MyUnzipFolder\"
    
            'Make the normal folder in DefPath
            MkDir FileNameFolder
    
            'Extract the files into the newly created folder
            Set oApp = CreateObject("Shell.Application")
    
            oApp.Namespace(FileNameFolder).CopyHere oApp.Namespace(fname).items
    
            On Error Resume Next
            Set FSO = CreateObject("scripting.filesystemobject")
            FSO.deletefolder Environ("Temp") & "\Temporary Directory*", True
    
    End Sub
     
    Sub Delete_Whole_Folder(MyPath As String)
        Dim FSO As Object
        Set FSO = CreateObject("scripting.filesystemobject")
        If FSO.FolderExists(MyPath) = False Then
            MsgBox MyPath & " doesn't exist"
            Exit Sub
        End If
        FSO.deletefolder MyPath
    End Sub
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  3. #3
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location
    Thanks for your help, unfortunately when I tried to open the PDF file acrobat comes up with an error saying the file is corrupted. Thus I compared the original PDf with the output result using notepad. I discover there are some extra lines in the .bin file that aren't in the original pdf. The original pdf starts with :

    %PDF-1.5
    %
    1 0 obj
    <</Type/Catalog/Pages 2 0 R/Lang(en-GB) >>
    endobj
    while the exported version starts with:

    ࡱ                >                                                                
             
                                                                 !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /   0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?   @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _   `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~      €   R o o t   E n t r y                                                  eʸ…*DEST               >€           O l e                                                         
                                                      C o m p O b j                                                                                               ^       C O N T E N T S                                                                                          x                                                                       
      eʸ…*DEST     Acrobat Document        AcroExch.Document.DC 9q                                                                                                                                                                                                                                                                                                                                                                                 ‚   ƒ   „   …   †   ‡   ˆ   ‰   Š   ‹   Œ      Ž         ‘   ’   “   ”   •   –   —   ˜   ™   š   ›   œ      ž   Ÿ                                           *                                                               %PDF-1.5
    %
    1 0 obj
    <</Type/Catalog/Pages 2 0 R/Lang(en-GB) >>
    endobj
    I tried manually deleting these lines and saving as a text file and then renaming the file, Acrobat can then open the file but the data table in the pdf has been lost. So notepad corrupts the file ( not surprisingly)
    Do you have any suggestions as to how to get rid of the extra lines.

    I am thinking of going back to the client and saying we need to work round this in a different way. It looks very prone to failure in the future and problem to support long term.

  4. #4
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    I was revisiting this on another PC where the PDF failed to open. I changed the viewer to Microsoft Edge which does allow the PDF to be viewed.I guess that's to do with the OLE coding referred to above.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  5. #5
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location
    Thanks, that solved the problem with viewing the pdf.

  6. #6
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Cleaned up version
    Option Explicit
    
    'Note: the extracted PDF files should open in Microsoft Edge _
    They will not open in Acrobat reader
    
    
    Sub ExtractPDF()
        Dim FName As Variant
        Dim TmpPath As Variant
        Dim FSO As Object
        Dim oApp As Object
        Dim sPath As String
        Dim Output As String
        Dim f As String
        Dim i As Long, j As Long
        Dim ftype As String
        'Set location and output; adjust as required
        sPath = "C:\VBAX\"
        Output = "PDFfile" & Format(Now, " yy_mm_dd_hh_mm")
        ftype = ".pdf"
        'Create Objects
        Set FSO = CreateObject("scripting.filesystemobject")
        Set oApp = CreateObject("Shell.Application")
        'Set paths and create folders
        TmpPath = sPath & "MyUnzipFolder"
        FName = sPath & "Data.zip"
        On Error Resume Next
        FSO.deletefolder TmpPath
        FSO.deletefolder sPath & "PDF"  'Deletes previously extracted files
        MkDir sPath & "PDF"
        MkDir TmpPath
        On Error GoTo 0
        'Make copy of workbook as zip file
        ActiveWorkbook.SaveCopyAs FName
        'Unzip bin files
        For j = 1 To oApp.Namespace(FName).items.Count
            oApp.Namespace(TmpPath).CopyHere oApp.Namespace(FName).items.Item("xl\embeddings\oleObject" & j & ".bin")
            f = TmpPath & "\oleObject" & j & ".bin"
            If Len(Dir(f)) = 0 Then Exit For
            Name f As sPath & "PDF\" & Output & Format(j, " - 00") & ftype
        Next j
        'Clean up and view files
        FSO.deletefolder TmpPath
        Shell "explorer.exe " & sPath & "PDF", vbNormalFocus
    End Sub
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •