PDA

View Full Version : VBA code to extract images from word.



mktmohit
05-08-2013, 07:22 PM
Hi All,

I have written a code to extract all the images from a word document and saves it in a seperate folder.

But problem is that , i need to save the images with the same name as present in the document.

I have attached my word file which have macro save_Images.

Saving is proper but i need the text below the image to be the name of the image.

Can any one guide me how to do this.

macropod
05-09-2013, 01:43 AM
There is no macro named 'save_Images' in your attachment. Docx files cannot contain code modules.

In any event, any code that extracts images from your attachment is going to have its work complicated by the fact you have numerous images in the document that don't have names below them (e.g. the 'Call' and 'End Call' buttons) and there is no practical way of extracting the names from the images themselves. Furthermore, it's not clear whether you want the entire caption as the image name, or just the part after the caption number.

mktmohit
05-09-2013, 04:32 AM
Hi Macropod,

The doc file which i attached is having macro named "save_images".

Go to view then macro -> View macro.

Question i need to ask like is it possible to change the name of the images from "images001" to that which is just below the image.

For example in my attached doc, the first image name is "Screen shot 20: Group call – Ready Page".

Can it be possible to give the same name while storing.

Please see my code. Guide me, i am very thankful.

macropod
05-09-2013, 04:37 AM
As I have already stated, a docx file cannot contain macros and there is no macro in your attachment.

mktmohit
05-09-2013, 06:43 AM
Hi Macropodo,

I have again attached the doc file for you.

I have tested that macro so many times, i dont know why its not showing to you.

Go to view-macro-view macro-save_images-run.

It will extract all the images of that doc into one folder.

macropod
05-09-2013, 02:55 PM
See my previous replies. I don't doubt that you can see your macro. Wherever it is, though, it is not and is incapable of being in your attachment. By insisting otherwise, you plainly do not understand the fundamental difference between docx and docm files.

In all likelihood your macro is in the document's template. For further information, see: http://www.addbalance.com/usersguide/templates.htm

fumei
05-09-2013, 02:56 PM
I have tested that macro so many times, i dont know why its not showing to you.

There is a very simple reason why it is not showing. It is because the attached file does NOT - as stated by macropod - have any macros in it. It can't - as also stated by macropod - because docX files can not have any macros.

Your macro is in your Normal.dotm file. This is the default location for macros. If you do not explicitly tell Word to store macros in an appropriate file, it stores them in Normal.

The doc file which i attached is having macro named "save_images".

To repeat, your file does NOT have any macros in it.


For example in my attached doc, the first image name is "Screen shot 20: Group call – Ready Page".

Can it be possible to give the same name while storing.

Probably. You are incorrect though in stating the image name is "Screen shot 20: Group call – Ready Page".

That is the text of the next paragraph...that is all. There is NO actual association to the image. That is the text of the next paragraph. There is probably a way to grab each image (Shape) and also get the text of the next paragraph. Then use that text as the file name of the Shapes you ae saving.

mktmohit
05-11-2013, 04:09 AM
Hi All,

I have attached now my word document which contains two macros, one is to resize the images and another to save images into one folder.

What i need to do is to save the images in the folder with the same name as present in the document.

Example. In my attached doc , below first image text "Screen shot 1: PTT key assignment" is there.

This image should be saved by the same name. But i am not able to do that.

Can someone guide me please.

macropod
05-11-2013, 04:30 AM
Once again, there are no macros in your attachment. Simply saving a document in the docm format doesn't add macros from a template or whatever to it ...

gmaxey
05-11-2013, 05:36 AM
After you figure out how to attach a file containing code in this forum, you might want to consider which of the four images in your sample document it is that you want to name "Screen shot 1: PTT key assignment." All four cannot have the same name.

mktmohit
05-11-2013, 06:08 AM
I dont know why macros are not showing to you, but when i download the same doc, in my laptop its showing.

Ok i will paste the content of both the macros

Resize image
==========
Dim PecentSize As Integer
Dim oIshp As InlineShape
Dim oshp As Shape

PercentSize = InputBox("Enter percent of full size", "Resize Picture", 100)

For Each oIshp In ActiveDocument.InlineShapes
With oIshp
.ScaleHeight = PercentSize
.ScaleWidth = PercentSize
End With
Next oIshp

For Each oshp In ActiveDocument.Shapes
With oshp
.ScaleHeight Factor:=(PercentSize / 100), _
RelativeToOriginalSize:=msoCTrue
.ScaleWidth Factor:=(PercentSize / 100), _
RelativeToOriginalSize:=msoCTrue
End With
Next oshp

Save Image
=========
FileName = ActiveDocument.FullName


prePendFileName = Left(ActiveDocument.Name, Len(ActiveDocument.Name) - 5)
'prePendFileName = Right(prePendFileName, Len(prePendFileName) - 11)


saveLocaton = "C:\Users\Mohit\Desktop\"


TodayDateString = Year(Date) & "_"
If Month(Date) < 10 Then
TodayDateString = TodayDateString & "0"
End If
TodayDateString = TodayDateString & Month(Date) & "_"

If Day(Date) < 10 Then
TodayDateString = TodayDateString & "0"
End If
TodayDateString = TodayDateString & Day(Date)



FolderName = TodayDateString & "_" & prePendFileName

MsgBox "Saving Images to " & saveLocaton & FolderName & "_files"

'Delete the folder if it exists
On Error Resume Next
Kill saveLocaton & FolderName & "_files\*" 'Delete all files
RmDir saveLocation & FolderName & "_files" 'Delete folder

'First Save the current document as is
ActiveDocument.Save


ActiveDocument.SaveAs2 FileName:=saveLocaton & FolderName & ".html", _
FileFormat:=wdFormatHTML

ActiveDocument.Close

Kill saveLocaton & FolderName & ".html"
Kill saveLocaton & FolderName & "_files\*.xml"
Kill saveLocaton & FolderName & "_files\*.html"
Kill saveLocaton & FolderName & "_files\*.thmx"

'Rename image Files
'This is written for files with 99 or fewer images
For x = 1 To 9
Name saveLocaton & FolderName & "_files\image00" _
& x & ".png" As saveLocaton & FolderName & "_files\" _
& prePendFileName & "_00" & x & ".png"
Next

' For x = 10 To 99
' Name saveLocaton & FolderName & "_files\image0" _
' & x & ".png" As saveLocaton & FolderName _
' & "_files\" & prePendFileName & "_0" & x & ".png"
' Next


Word.Documents.Open (FileName)


Word.Application.Visible = True
Word.Application.Activate




Please help regarding this.

if you run this code image will be saved in the folder by the name "image001" but i need the name to be like "Screen shot 1: PTT key assignment" .
I dont know how to grep that text via vba which is written below the image.

Please help.

fumei
05-11-2013, 03:20 PM
Are you seriously saying that:

For x = 1 To 9
Name saveLocaton & FolderName & "_files\image00" _
& x & ".png" As saveLocaton & FolderName & "_files\" _
& prePendFileName & "_00" & x & ".png"
Next

is going to save images from IN the file?

Please use the VBA code tags when posting code.

gmaxey
05-11-2013, 04:53 PM
You still haven't answered the question "Which one of the four images in your example to you want named using the common caption? All four can't have the same name.

You might consider putting your images and captions in a borderless table. then you can use the cell ranges as a reference. For example, if the image is in Cell(1,1) then you would want to name it Cell(3, 1).range.Text & Cell(2,1) .Range.Text etc.

mktmohit
05-12-2013, 03:08 AM
Are you seriously saying that:

For x = 1 To 9
Name saveLocaton & FolderName & "_files\image00" _
& x & ".png" As saveLocaton & FolderName & "_files\" _
& prePendFileName & "_00" & x & ".png"
Next

is going to save images from IN the file?

Please use the VBA code tags when posting code.

@Fumie,
I already have this code to save the images in folder, but it will come as image o1, image02 and so on, but what i need to do is to give the name of my images acoording to the text written below each images in document.

mktmohit
05-12-2013, 03:11 AM
You still haven't answered the question "Which one of the four images in your example to you want named using the common caption? All four can't have the same name.

You might consider putting your images and captions in a borderless table. then you can use the cell ranges as a reference. For example, if the image is in Cell(1,1) then you would want to name it Cell(3, 1).range.Text & Cell(2,1) .Range.Text etc.


@gmaxey.

simply please understand that whatever text is written below every images i need to save these images with that name.

Like if below image text is "Fig 1 1000" then i need to save that image with the same name.

Actually i need to get the logic or rule to grep the text written below the image.

If you have any idea other than it let me know, Meanwhile i am searching the same.:banghead:

macropod
05-12-2013, 03:20 AM
simply please understand that whatever text is written below every images i need to save these images with that name.
Instead of treating Greg like a fool, you should look at the documents you've posted to this forum. The first one has numerous images that don't have names below them and the first four images in the last document you posted have only one line below the lot of them and they cannot all be given that as a name. With the unstructured mess you're posting you're asking for the impossible.

mktmohit
05-12-2013, 03:30 AM
Instead of treating Greg like a fool, you should look at the documents you've posted to this forum. The first one has numerous images that don't have names below them and the first four images in the last document you posted have only one line below the lot of them and they cannot all be given that as a name. With the unstructured mess you're posting you're asking for the impossible.


@macropod,

my intention was not that.

I have attached my doc once again, in which every image has a name below it, i need that name while saving.

Thanks Macropod for guiding me. :)

macropod
05-12-2013, 03:35 AM
Your latest document is better, but even that is inconsistently laid out. Your first image, for example, has a paragraph of space characters before it, and the 'caption' is in the paragraph after that. The other images have the caption paragraphs immediately after the images. Unless you have a consistent layout, don't expect anyone's code to provide the results you want.

mktmohit
05-12-2013, 04:09 AM
Your latest document is better, but even that is inconsistently laid out. Your first image, for example, has a paragraph of space characters before it, and the 'caption' is in the paragraph after that. The other images have the caption paragraphs immediately after the images. Unless you have a consistent layout, don't expect anyone's code to provide the results you want.

Yes you are ryt its inconsistent, but see what i want, while saving the image, we have to grep the below text.

Example :- For x = 1 To 9
Name saveLocaton & FolderName & "_files\image00" _
& x & ".png" As saveLocaton & FolderName & "_files\" _
& prePendFileName & "_00" & x & ".png"
Next

in between this code we need to implement the logic of grepping that below text and will give that name.

What i am thinking is when compiler is giving name to image it will search for the text starting with "Screen shot" and will print whole line followed by Screen shot and will be named as the name of that image.

Like the first image in my doc should be save by the name "Screen shot 1: PTT key assignment".

let me know if u need further explanation.

gmaxey
05-12-2013, 06:26 AM
To be honest, I was seriously considering ignoring this thead due to lack of cooperation on your part. However the challenge was interesting so I stuck with it.

Here are some things that you should consider before posting again.

1. Answer the questions ask. Doing so may help you discover and resolve the confusion that peoople trying to help you are experiencing. Stamping your foot and endlessly repeating your same question does not help.

2. Provide helpful attachements. Sending a file with two .jpg images and three .png images with code to rename all as .png is not helpful.

3. Declare you're varialbes. I do, and I think most regulars here do, and getting your code with a bunch of undeclared variables is frustrating.

Sub SaveImage()
Dim strFullName As String, strFileName As String
Dim strBasePath As String, strDate As String
Dim strFolderName As String, strImageName As String
Dim x As Long
Dim oRng As Word.Range
Dim arrCaptions() As String
Dim strExt As String

ReDim arrCaptions(ActiveDocument.InlineShapes.Count - 1)
For x = 1 To ActiveDocument.InlineShapes.Count
Set oRng = ActiveDocument.InlineShapes(x).Range
oRng.Collapse wdCollapseEnd
oRng.MoveEndUntil Cset:="Screen", Count:=wdForward
oRng.Collapse wdCollapseEnd
oRng.MoveEndUntil Cset:=Chr(13), Count:=wdForward
arrCaptions(x - 1) = oRng.Text
Next x

strFullName = ActiveDocument.FullName
strFileName = Left(ActiveDocument.Name, Len(ActiveDocument.Name) - 5)
strBasePath = "C:\Users\Mohit\Desktop\"
strDate = Format(Now, "yyyy-mm-dd")
strFolderName = strDate & "_" & strFileName
'Delete the folder if it exists
On Error Resume Next
Kill strBasePath & strFolderName & "_files\*" 'Delete all files
RmDir strBasePath & strFolderName & "_files" 'Delete folder
On Error GoTo 0
'Save the current document.
ActiveDocument.Save
'Save in HTML format
ActiveDocument.SaveAs2 FileName:=strBasePath & strFolderName & ".html", FileFormat:=wdFormatHTML
ActiveDocument.Close
On Error Resume Next
Kill strBasePath & strFolderName & ".html"
Kill strBasePath & strFolderName & "_files\*.xml"
Kill strBasePath & strFolderName & "_files\*.html"
Kill strBasePath & strFolderName & "_files\*.thmx"
On Error GoTo 0
'Rename image Files
On Error GoTo Err_Ext
For x = 0 To UBound(arrCaptions)
strImageName = Replace(arrCaptions(x), ":", "-")
strExt = ".png"
Name strBasePath & strFolderName & "_files\image00" _
& x + 1 & strExt As strBasePath & strFolderName & "_files\" _
& strFileName & strImageName & strExt
Next
Word.Documents.Open (strFullName)
Word.Application.Visible = True
Word.Application.Activate
Exit Sub
Err_Ext:
strExt = ".jpg"
Resume
End Sub

mktmohit
05-12-2013, 06:42 AM
To be honest, I was seriously considering ignoring this thead due to lack of cooperation on your part. However the challenge was interesting so I stuck with it.

Here are some things that you should consider before posting again.

1. Answer the questions ask. Doing so may help you discover and resolve the confusion that peoople trying to help you are experiencing. Stamping your foot and endlessly repeating your same question does not help.

2. Provide helpful attachements. Sending a file with two .jpg images and three .png images with code to rename all as .png is not helpful.

3. Declare you're varialbes. I do, and I think most regulars here do, and getting your code with a bunch of undeclared variables is frustrating.

Sub SaveImage()
Dim strFullName As String, strFileName As String
Dim strBasePath As String, strDate As String
Dim strFolderName As String, strImageName As String
Dim x As Long
Dim oRng As Word.Range
Dim arrCaptions() As String
Dim strExt As String

ReDim arrCaptions(ActiveDocument.InlineShapes.Count - 1)
For x = 1 To ActiveDocument.InlineShapes.Count
Set oRng = ActiveDocument.InlineShapes(x).Range
oRng.Collapse wdCollapseEnd
oRng.MoveEndUntil Cset:="Screen", Count:=wdForward
oRng.Collapse wdCollapseEnd
oRng.MoveEndUntil Cset:=Chr(13), Count:=wdForward
arrCaptions(x - 1) = oRng.Text
Next x

strFullName = ActiveDocument.FullName
strFileName = Left(ActiveDocument.Name, Len(ActiveDocument.Name) - 5)
strBasePath = "C:\Users\Mohit\Desktop\"
strDate = Format(Now, "yyyy-mm-dd")
strFolderName = strDate & "_" & strFileName
'Delete the folder if it exists
On Error Resume Next
Kill strBasePath & strFolderName & "_files\*" 'Delete all files
RmDir strBasePath & strFolderName & "_files" 'Delete folder
On Error GoTo 0
'Save the current document.
ActiveDocument.Save
'Save in HTML format
ActiveDocument.SaveAs2 FileName:=strBasePath & strFolderName & ".html", FileFormat:=wdFormatHTML
ActiveDocument.Close
On Error Resume Next
Kill strBasePath & strFolderName & ".html"
Kill strBasePath & strFolderName & "_files\*.xml"
Kill strBasePath & strFolderName & "_files\*.html"
Kill strBasePath & strFolderName & "_files\*.thmx"
On Error GoTo 0
'Rename image Files
On Error GoTo Err_Ext
For x = 0 To UBound(arrCaptions)
strImageName = Replace(arrCaptions(x), ":", "-")
strExt = ".png"
Name strBasePath & strFolderName & "_files\image00" _
& x + 1 & strExt As strBasePath & strFolderName & "_files\" _
& strFileName & strImageName & strExt
Next
Word.Documents.Open (strFullName)
Word.Application.Visible = True
Word.Application.Activate
Exit Sub
Err_Ext:
strExt = ".jpg"
Resume
End Sub


Greg,

Will take care of your points in future.

While running your code, it went for endless loop.

While debuuging the code i came to know that control is not able to come out from loop between below code
Name strBasePath & strFolderName & "_files\image00" _
& x + 1 & strExt As strBasePath & strFolderName & "_files\" _
& strFileName & strImageName & strExt
Next

AND

Err_Ext:
strExt = ".png"
Resume


Can you check once. Did you got my original question or shall i explain again???

gmaxey
05-12-2013, 06:47 AM
The code runs without error on the last sample document that you posted. It contains two .jpg format files and three .png format files. Did you test on the same file?

mktmohit
05-12-2013, 06:52 AM
The code runs without error on the last sample document that you posted. It contains two .jpg format files and three .png format files. Did you test on the same file?


Yes Greg,

I run that code in that doc only but it is going for endless loop.

I have checked again.

Can you please send me that doc again.

Thanks

mktmohit
05-12-2013, 09:06 AM
The code runs without error on the last sample document that you posted. It contains two .jpg format files and three .png format files. Did you test on the same file?

Greg,
Loop problem has been sorted now.
I dont know how shall i show my thanks to you.
Thanks a million Greg sir.:friends:
Its working correct with the same name, but still i need some help as when i pasted more pictures on my doc it is going to endless loop again.

I have done lot of testing but cant able to find which things should i avoid while pasting images.

Please can you tell me which things should i avoid while pasting like space after image, size etc. so that endless loop will be avoided.

gmaxey
05-12-2013, 10:02 AM
This is more complicated than I expected. While it doesn't occur in the document that you posted here, when I created and tested with a new document I discovered that when the HTML file is create that it will "typically" create two image files for each inlineshape in the active documet. This means if you have two images in the original file there will be four files in the extraction folder. I've managed to deal with a one for one count and a two for one count, but I can't be certain that there will always be two images files created for every inline shape. So, the following would fail in that situation.

Just be sure that the caption text is in the paragraph immediately following the inlineshape. With the exception of the ":" which I've addressed in the code, don't use text that would be an invalid file name character.


Sub ExtractImages()
Dim strFileNameAndPath As String, strFileName As String, strName As String
Dim strBasePath As String, strDate As String
Dim strExtractionFolder As String, strImageName As String
Dim bThumbnails As Boolean
Dim lngIndex As Long
Dim oRng As Word.Range
Dim arrCaptions() As String
Dim lngCount As Long

'Store image caption names. The paragraph immediately following the shape paragraph defines the image caption.
ReDim arrCaptions(ActiveDocument.InlineShapes.Count - 1)
For lngIndex = 1 To ActiveDocument.InlineShapes.Count
Set oRng = ActiveDocument.InlineShapes(lngIndex).Range
oRng.Collapse wdCollapseEnd
'Move range to start of caption paragraph text.
oRng.Move wdParagraph, 1
'Extend range to end of paragraph.
oRng.MoveEndUntil Cset:=Chr(13), Count:=wdForward
arrCaptions(lngIndex - 1) = oRng.Text
Next lngIndex

strFileNameAndPath = ActiveDocument.FullName
'Define folder for extracted images.
strBasePath = SpecialFolderPath & Application.PathSeparator
strDate = Format(Now, "yyyy-mm-dd")
strFileName = GetFileNameWithoutExtension(ActiveDocument.Name)
strExtractionFolder = strDate & "_" & strFileName

'Delete the folder if it exists.
On Error Resume Next
'Delete any files.
Kill strBasePath & strExtractionFolder & "_files\*"
RmDir strBasePath & strExtractionFolder & "_files"
On Error GoTo 0

'Save the current document.
ActiveDocument.Save

'Save document in HTML format. This creates the "_files\" folder in the Extraction Folder.
ActiveDocument.SaveAs2 filename:=strBasePath & strExtractionFolder & ".html", FileFormat:=wdFormatHTML

ActiveDocument.Close

On Error Resume Next
'Get rid of extraneous data files. Keep only the images.
Kill strBasePath & strExtractionFolder & ".html"
Kill strBasePath & strExtractionFolder & "_files\*.xml"
Kill strBasePath & strExtractionFolder & "_files\*.html"
Kill strBasePath & strExtractionFolder & "_files\*.thmx"
On Error GoTo 0

'Rename image files.
'When converting to HTML format images in the original document may be duplicated as thumbnails in the HTML file.
'We need to get if the image file count mathches the original InlineShape count of the document.
lngCount = Count_Files(strBasePath & strExtractionFolder & "_files")
Select Case True
Case lngCount = UBound(arrCaptions) + 1
'Image count matches so do a one for one rename.
lngIndex = 0
strName = Dir(strBasePath & strExtractionFolder & "_files\")
While strName <> ""
'Some characters are invalid in file names. A colon is invalid and could be used in the caption e.g., Screenshot 1: The Mona Lisa"
strImageName = Replace(arrCaptions(lngIndex), ":", "-")
Name strBasePath & strExtractionFolder & "_files\" & strName As strBasePath & strExtractionFolder & "_files\" & strImageName & ".png"
strName = Dir()
lngIndex = lngIndex + 1
Wend
Case lngCount = 2 * (UBound(arrCaptions) + 1)
'Duplicates where created.
lngIndex = 0
bThumbnails = True
strName = Dir(strBasePath & strExtractionFolder & "_files\")
While strName <> ""
'strExt = ".png"
strImageName = Replace(arrCaptions(lngIndex), ":", "-")
If bThumbnails Then
Name strBasePath & strExtractionFolder & "_files\" & strName As strBasePath & strExtractionFolder & "_files\" & strImageName & ".png"
Else
Name strBasePath & strExtractionFolder & "_files\" & strName As strBasePath & strExtractionFolder & "_files\" & strImageName & " thumbnail" & ".png"
End If
strName = Dir()
If Not bThumbnails Then lngIndex = lngIndex + 1
bThumbnails = Not bThumbnails
Wend
Case Else
MsgBox "File count mismatch occurred. Images were not renamed."
End Select
Word.Documents.Open (strFileNameAndPath)
Word.Application.Visible = True
Word.Application.Activate
Exit Sub
lbl_Exit:
Exit Sub
End Sub
Function Count_Files(strFolder As String) As Long
Dim strName As String
Dim lngCount As Integer

If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
lngCount = 0
strName = Dir(strFolder)
While strName <> ""
lngCount = lngCount + 1
strName = Dir()
Wend
Count_Files = lngCount
lbl_Exit:
Exit Function
End Function
Function GetFileNameWithoutExtension(ByRef strFileName As String) As String
On Error GoTo Err_NoExtension
GetFileNameWithoutExtension = VBA.Left(strFileName, (InStrRev(strFileName, ".", -1, vbTextCompare) - 1))
lbl_Exit:
Exit Function
Err_NoExtension:
GetFileNameWithoutExtension = strFileName
Resume lbl_Exit
End Function
Function SpecialFolderPath() As String
Dim objWSHShell As Object
Dim strSpecialFolderPath

Set objWSHShell = CreateObject("WScript.Shell")
SpecialFolderPath = objWSHShell.SpecialFolders("Desktop")
Set objWSHShell = Nothing
lbl_Exit:
Exit Function
End Function

gmaxey
05-12-2013, 10:52 AM
Ok, it seems that changing the fileformat to wdFilteredHTML prevents the duplicate files:

Sub ExtractImages()
Dim strFileNameAndPath As String, strFileName As String, strName As String
Dim strBasePath As String, strDate As String
Dim strExtractionFolder As String, strImageName As String
Dim lngIndex As Long
Dim oRng As Word.Range
Dim arrCaptions() As String

'Store image caption names. The paragraph immediately following the shape paragraph defines the image caption.
ReDim arrCaptions(ActiveDocument.InlineShapes.Count - 1)
For lngIndex = 1 To ActiveDocument.InlineShapes.Count
Set oRng = ActiveDocument.InlineShapes(lngIndex).Range
oRng.Collapse wdCollapseEnd
'Move range to start of caption paragraph text.
oRng.Move wdParagraph, 1
'Extend range to end of paragraph.
oRng.MoveEndUntil Cset:=Chr(13), Count:=wdForward
arrCaptions(lngIndex - 1) = oRng.Text
Next lngIndex

strFileNameAndPath = ActiveDocument.FullName
'Define folder for extracted images.
strBasePath = SpecialFolderPath & Application.PathSeparator
strDate = Format(Now, "yyyy-mm-dd")
strFileName = GetFileNameWithoutExtension(ActiveDocument.Name)
strExtractionFolder = strDate & "_" & strFileName

'Delete the folder if it exists.
On Error Resume Next
'Delete any files.
Kill strBasePath & strExtractionFolder & "_files\*"
RmDir strBasePath & strExtractionFolder & "_files"
On Error GoTo 0

'Save the current document.
ActiveDocument.Save

'Save document in HTML format. This creates the "_files\" folder in the Extraction Folder.
ActiveDocument.SaveAs2 filename:=strBasePath & strExtractionFolder & ".html", FileFormat:=wdFormatFilteredHTML

ActiveDocument.Close

On Error Resume Next
'Get rid of extraneous data files. Keep only the images.
Kill strBasePath & strExtractionFolder & ".html"
Kill strBasePath & strExtractionFolder & "_files\*.xml"
Kill strBasePath & strExtractionFolder & "_files\*.html"
Kill strBasePath & strExtractionFolder & "_files\*.thmx"
On Error GoTo 0

'Rename image files.
lngIndex = 0
strName = Dir(strBasePath & strExtractionFolder & "_files\")
While strName <> ""
'Some characters are invalid in file names. A colon is invalid and could be used in the caption e.g., Screenshot 1: The Mona Lisa"
strImageName = Replace(arrCaptions(lngIndex), ":", "-")
Name strBasePath & strExtractionFolder & "_files\" & strName As strBasePath & strExtractionFolder & "_files\" & strImageName & ".png"
strName = Dir()
lngIndex = lngIndex + 1
Wend
Word.Documents.Open (strFileNameAndPath)
Word.Application.Visible = True
Word.Application.Activate
Exit Sub
lbl_Exit:
Exit Sub
End Sub
Function GetFileNameWithoutExtension(ByRef strFileName As String) As String
On Error GoTo Err_NoExtension
GetFileNameWithoutExtension = VBA.Left(strFileName, (InStrRev(strFileName, ".", -1, vbTextCompare) - 1))
lbl_Exit:
Exit Function
Err_NoExtension:
GetFileNameWithoutExtension = strFileName
Resume lbl_Exit
End Function
Function SpecialFolderPath() As String
Dim objWSHShell As Object
Dim strSpecialFolderPath

Set objWSHShell = CreateObject("WScript.Shell")
SpecialFolderPath = objWSHShell.SpecialFolders("Desktop")
Set objWSHShell = Nothing
lbl_Exit:
Exit Function
End Function

mktmohit
05-12-2013, 09:58 PM
Ok, it seems that changing the fileformat to wdFilteredHTML prevents the duplicate files:

Sub ExtractImages()
Dim strFileNameAndPath As String, strFileName As String, strName As String
Dim strBasePath As String, strDate As String
Dim strExtractionFolder As String, strImageName As String
Dim lngIndex As Long
Dim oRng As Word.Range
Dim arrCaptions() As String

'Store image caption names. The paragraph immediately following the shape paragraph defines the image caption.
ReDim arrCaptions(ActiveDocument.InlineShapes.Count - 1)
For lngIndex = 1 To ActiveDocument.InlineShapes.Count
Set oRng = ActiveDocument.InlineShapes(lngIndex).Range
oRng.Collapse wdCollapseEnd
'Move range to start of caption paragraph text.
oRng.Move wdParagraph, 1
'Extend range to end of paragraph.
oRng.MoveEndUntil Cset:=Chr(13), Count:=wdForward
arrCaptions(lngIndex - 1) = oRng.Text
Next lngIndex

strFileNameAndPath = ActiveDocument.FullName
'Define folder for extracted images.
strBasePath = SpecialFolderPath & Application.PathSeparator
strDate = Format(Now, "yyyy-mm-dd")
strFileName = GetFileNameWithoutExtension(ActiveDocument.Name)
strExtractionFolder = strDate & "_" & strFileName

'Delete the folder if it exists.
On Error Resume Next
'Delete any files.
Kill strBasePath & strExtractionFolder & "_files\*"
RmDir strBasePath & strExtractionFolder & "_files"
On Error GoTo 0

'Save the current document.
ActiveDocument.Save

'Save document in HTML format. This creates the "_files\" folder in the Extraction Folder.
ActiveDocument.SaveAs2 filename:=strBasePath & strExtractionFolder & ".html", FileFormat:=wdFormatFilteredHTML

ActiveDocument.Close

On Error Resume Next
'Get rid of extraneous data files. Keep only the images.
Kill strBasePath & strExtractionFolder & ".html"
Kill strBasePath & strExtractionFolder & "_files\*.xml"
Kill strBasePath & strExtractionFolder & "_files\*.html"
Kill strBasePath & strExtractionFolder & "_files\*.thmx"
On Error GoTo 0

'Rename image files.
lngIndex = 0
strName = Dir(strBasePath & strExtractionFolder & "_files\")
While strName <> ""
'Some characters are invalid in file names. A colon is invalid and could be used in the caption e.g., Screenshot 1: The Mona Lisa"
strImageName = Replace(arrCaptions(lngIndex), ":", "-")
Name strBasePath & strExtractionFolder & "_files\" & strName As strBasePath & strExtractionFolder & "_files\" & strImageName & ".png"
strName = Dir()
lngIndex = lngIndex + 1
Wend
Word.Documents.Open (strFileNameAndPath)
Word.Application.Visible = True
Word.Application.Activate
Exit Sub
lbl_Exit:
Exit Sub
End Sub
Function GetFileNameWithoutExtension(ByRef strFileName As String) As String
On Error GoTo Err_NoExtension
GetFileNameWithoutExtension = VBA.Left(strFileName, (InStrRev(strFileName, ".", -1, vbTextCompare) - 1))
lbl_Exit:
Exit Function
Err_NoExtension:
GetFileNameWithoutExtension = strFileName
Resume lbl_Exit
End Function
Function SpecialFolderPath() As String
Dim objWSHShell As Object
Dim strSpecialFolderPath

Set objWSHShell = CreateObject("WScript.Shell")
SpecialFolderPath = objWSHShell.SpecialFolders("Desktop")
Set objWSHShell = Nothing
lbl_Exit:
Exit Function
End Function


Greg,

Thanks, now its working fine on one of the document, I will test on other documents and will let u know if any issues occurs.

Thank you Greg, you simply awesome :friends:

mktmohit
05-13-2013, 03:53 AM
Greg,

Thanks, now its working fine on one of the document, I will test on other documents and will let u know if any issues occurs.

Thank you Greg, you simply awesome :friends:

Greg,

I have tested your code in other documents, i came to know that some files are naming conventions are wrong, means image name went ahead from the image itself. E.g name of image one is given to the name of image 3 or 4.

Secondly, code is still stucking giving mostly two type of errors "Invalid call procedure" and "File already exist".

According to your code please let me know what points should we keep in mind while preparing docs.

latest test result
=============
Docs having 200 images, when run the code till 130 images name were there acc. to that present in the doc(With some errors as described above) after that images were saved as image131 image132 and so on.


Greg please let have some change in code or tell me what points should i keep in mind while adding images to our docs.

Greg, we already reached near end, please help here also so that w can finish it off.
Thanks:thumb :thumb

gmaxey
05-13-2013, 12:34 PM
I don't really know what else that I can do for you. Needless to say, I've spent considerably more free time on this than I could afford.

You need to set stops or breaks in the code and see if you can identify where and why the errors occur. Ensure you have no invalid characters in your caption text. I don't recall what all are invalid but for example "\" is invalid as well as ":"

mktmohit
05-15-2013, 05:15 AM
I don't really know what else that I can do for you. Needless to say, I've spent considerably more free time on this than I could afford.

You need to set stops or breaks in the code and see if you can identify where and why the errors occur. Ensure you have no invalid characters in your caption text. I don't recall what all are invalid but for example "\" is invalid as well as ":"
Greg,

Thank you for your time, i have made necessary changes in it, now its working fine.

Thank you all who replied and viewed my thread, thank you vba express

mesp9942
05-21-2013, 08:57 AM
Hi,
Greg-I am wondering if you have a suggestion how to then replace the images with a string in the format "{{imagename}}". For example I would want to replace image001 with "{{image001.jpg}}". My images are in different formats so I would somehow need to figure out how to name them with their respective extensions. Either that, or convert all the images to the same extension/format.
Any suggestions for code that would work?
Thanks,
Meghan

ced
01-01-2014, 11:32 AM
thank you greg for this great job

Ok, it seems that changing the fileformat to wdFilteredHTML prevents the duplicate files:

Sub ExtractImages()
Dim strFileNameAndPath As String, strFileName As String, strName As String
Dim strBasePath As String, strDate As String
Dim strExtractionFolder As String, strImageName As String
Dim lngIndex As Long
Dim oRng As Word.Range
Dim arrCaptions() As String

'Store image caption names. The paragraph immediately following the shape paragraph defines the image caption.
ReDim arrCaptions(ActiveDocument.InlineShapes.Count - 1)
For lngIndex = 1 To ActiveDocument.InlineShapes.Count
Set oRng = ActiveDocument.InlineShapes(lngIndex).Range
oRng.Collapse wdCollapseEnd
'Move range to start of caption paragraph text.
oRng.Move wdParagraph, 1
'Extend range to end of paragraph.
oRng.MoveEndUntil Cset:=Chr(13), Count:=wdForward
arrCaptions(lngIndex - 1) = oRng.Text
Next lngIndex

strFileNameAndPath = ActiveDocument.FullName
'Define folder for extracted images.
strBasePath = SpecialFolderPath & Application.PathSeparator
strDate = Format(Now, "yyyy-mm-dd")
strFileName = GetFileNameWithoutExtension(ActiveDocument.Name)
strExtractionFolder = strDate & "_" & strFileName

'Delete the folder if it exists.
On Error Resume Next
'Delete any files.
Kill strBasePath & strExtractionFolder & "_files\*"
RmDir strBasePath & strExtractionFolder & "_files"
On Error GoTo 0

'Save the current document.
ActiveDocument.Save

'Save document in HTML format. This creates the "_files\" folder in the Extraction Folder.
ActiveDocument.SaveAs2 filename:=strBasePath & strExtractionFolder & ".html", FileFormat:=wdFormatFilteredHTML

ActiveDocument.Close

On Error Resume Next
'Get rid of extraneous data files. Keep only the images.
Kill strBasePath & strExtractionFolder & ".html"
Kill strBasePath & strExtractionFolder & "_files\*.xml"
Kill strBasePath & strExtractionFolder & "_files\*.html"
Kill strBasePath & strExtractionFolder & "_files\*.thmx"
On Error GoTo 0

'Rename image files.
lngIndex = 0
strName = Dir(strBasePath & strExtractionFolder & "_files\")
While strName <> ""
'Some characters are invalid in file names. A colon is invalid and could be used in the caption e.g., Screenshot 1: The Mona Lisa"
strImageName = Replace(arrCaptions(lngIndex), ":", "-")
Name strBasePath & strExtractionFolder & "_files\" & strName As strBasePath & strExtractionFolder & "_files\" & strImageName & ".png"
strName = Dir()
lngIndex = lngIndex + 1
Wend
Word.Documents.Open (strFileNameAndPath)
Word.Application.Visible = True
Word.Application.Activate
Exit Sub
lbl_Exit:
Exit Sub
End Sub
Function GetFileNameWithoutExtension(ByRef strFileName As String) As String
On Error GoTo Err_NoExtension
GetFileNameWithoutExtension = VBA.Left(strFileName, (InStrRev(strFileName, ".", -1, vbTextCompare) - 1))
lbl_Exit:
Exit Function
Err_NoExtension:
GetFileNameWithoutExtension = strFileName
Resume lbl_Exit
End Function
Function SpecialFolderPath() As String
Dim objWSHShell As Object
Dim strSpecialFolderPath

Set objWSHShell = CreateObject("WScript.Shell")
SpecialFolderPath = objWSHShell.SpecialFolders("Desktop")
Set objWSHShell = Nothing
lbl_Exit:
Exit Function
End Function