PDA

View Full Version : [SOLVED:] Save embedded PDF file as a separate PDf file



offthelip
05-17-2017, 11:32 AM
I am developing a system where the client wants to embed pdf documents into a worksheet. Linking in this case is not applicable because the worksheet will be emailed to another system. I want to write macro to extract the embedded pdf objects and save them as separate files. Does anybody know of a way of doing this?

mdmackillop
05-17-2017, 03:05 PM
Needs a bit of error handling and some tidying but give this a try, as posted here (https://stackoverflow.com/questions/7144056/extract-the-pdf-file-embedded-in-excel-and-save-it-in-to-a-folder-using-c-shar)


Another approach would be to try to extract the files from the .xslx file - this can be done quite easily, as it's essentially just a .zip file. To play around with this manually, rename your .xlsx file to .zip and extract it. Inside, you will find a folder like "xl", and inside there, "embeddings". Your PDF document will be here, unfortunately encoded as an OLE Object. You can then try some get the data out of it, one example is here (http://stackoverflow.com/questions/2297675/code-to-read-oleobject-files).

Edit: I was revisiting this on another PC where the PDF failed to open. I changed the viewer to Microsoft Edge which does allow the PDF to be viewed.I guess that's to do with the OLE coding referred to above.


Dim sPath As String

Sub ExtractPDF()
Dim fname As Variant
Dim Pth As String

sPath = "C:\VBAX\"

fname = sPath & "Data.zip"
Pth = sPath & "MyUnzipFolder\xl\embeddings\"
MkDir sPath & "PDF"
ActiveWorkbook.SaveCopyAs fname
Unzip1 fname

f = Dir(Pth & "*.bin")
Do Until f = ""
i = i + 1
Name Pth & f As sPath & "PDF\" & "PDFfile" & Format(i, "00") & ".pdf"
f = Dir
Loop
Delete_Whole_Folder (sPath & "MyUnzipFolder")
retVal = Shell("explorer.exe " & sPath & "PDF", vbNormalFocus)

End Sub


Sub Unzip1(fname As Variant)
'https://www.rondebruin.nl/win/s7/win002.htm
Dim FSO As Object
Dim oApp As Object

Dim FileNameFolder As Variant
Dim DefPath As String
Dim strDate As String

DefPath = sPath

'Create the folder name
FileNameFolder = DefPath & "MyUnzipFolder\"

'Make the normal folder in DefPath
MkDir FileNameFolder

'Extract the files into the newly created folder
Set oApp = CreateObject("Shell.Application")

oApp.Namespace(FileNameFolder).CopyHere oApp.Namespace(fname).items

On Error Resume Next
Set FSO = CreateObject("scripting.filesystemobject")
FSO.deletefolder Environ("Temp") & "\Temporary Directory*", True

End Sub

Sub Delete_Whole_Folder(MyPath As String)
Dim FSO As Object
Set FSO = CreateObject("scripting.filesystemobject")
If FSO.FolderExists(MyPath) = False Then
MsgBox MyPath & " doesn't exist"
Exit Sub
End If
FSO.deletefolder MyPath
End Sub

offthelip
05-18-2017, 01:10 AM
Thanks for your help, unfortunately when I tried to open the PDF file acrobat comes up with an error saying the file is corrupted. Thus I compared the original PDf with the output result using notepad. I discover there are some extra lines in the .bin file that aren't in the original pdf. The original pdf starts with :


%PDF-1.5
%
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-GB) >>
endobj


while the exported version starts with:


ࡱ >

! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  € R o o t E n t r y eʸ…*DEST >€ O l e
C o m p O b j ^ C O N T E N T S x
eʸ…*DEST Acrobat Document AcroExch.Document.DC 9q ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ * %PDF-1.5
%
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-GB) >>
endobj


I tried manually deleting these lines and saving as a text file and then renaming the file, Acrobat can then open the file but the data table in the pdf has been lost. So notepad corrupts the file ( not surprisingly)
Do you have any suggestions as to how to get rid of the extra lines.

I am thinking of going back to the client and saying we need to work round this in a different way. It looks very prone to failure in the future and problem to support long term.

mdmackillop
05-18-2017, 01:23 AM
I was revisiting this on another PC where the PDF failed to open. I changed the viewer to Microsoft Edge which does allow the PDF to be viewed.I guess that's to do with the OLE coding referred to above.

offthelip
05-18-2017, 03:12 AM
Thanks, that solved the problem with viewing the pdf.

mdmackillop
05-18-2017, 05:54 AM
Cleaned up version

Option Explicit

'Note: the extracted PDF files should open in Microsoft Edge _
They will not open in Acrobat reader


Sub ExtractPDF()
Dim FName As Variant
Dim TmpPath As Variant
Dim FSO As Object
Dim oApp As Object
Dim sPath As String
Dim Output As String
Dim f As String
Dim i As Long, j As Long
Dim ftype As String
'Set location and output; adjust as required
sPath = "C:\VBAX\"
Output = "PDFfile" & Format(Now, " yy_mm_dd_hh_mm")
ftype = ".pdf"
'Create Objects
Set FSO = CreateObject("scripting.filesystemobject")
Set oApp = CreateObject("Shell.Application")
'Set paths and create folders
TmpPath = sPath & "MyUnzipFolder"
FName = sPath & "Data.zip"
On Error Resume Next
FSO.deletefolder TmpPath
FSO.deletefolder sPath & "PDF" 'Deletes previously extracted files
MkDir sPath & "PDF"
MkDir TmpPath
On Error GoTo 0
'Make copy of workbook as zip file
ActiveWorkbook.SaveCopyAs FName
'Unzip bin files
For j = 1 To oApp.Namespace(FName).items.Count
oApp.Namespace(TmpPath).CopyHere oApp.Namespace(FName).items.Item("xl\embeddings\oleObject" & j & ".bin")
f = TmpPath & "\oleObject" & j & ".bin"
If Len(Dir(f)) = 0 Then Exit For
Name f As sPath & "PDF\" & Output & Format(j, " - 00") & ftype
Next j
'Clean up and view files
FSO.deletefolder TmpPath
Shell "explorer.exe " & sPath & "PDF", vbNormalFocus
End Sub