PDA

View Full Version : Save JPEG's from Web Page



Shred Dude
04-06-2009, 09:35 AM
I'm attempting to Save all images from a web page. I'm sure this is easy, but I've hit a mind block. Any pointers much appreciated.

Code below opens a web page and grabs all Images into an HTML Element Collection for further processing.

I want to save each image to a directory on my C: drive, similar to the way you can when you right-click an image on the web-page and then invoke the Save Picture As dialog box, in IE. I've had no luck finding a reference to that particular control as a means of facilitating the save. I'm sure there's an easy way I'm just overlooking. I just can't seem to invoke a Save procedure to grab the file ("XXX.jpg").


Public Sub GetImages()
'References to:
' HTML Object Library
' M$ Internet Controls

Dim IEApp As Object
Dim URL As String

Dim myImages As IHTMLElementCollection


Set IEApp = New internetexplorer

URL = "www.XYZ.com"

With IEApp
.navigate URL
.Visible = True

While .Busy
doevenets
Wend

End With

Set myImages = IEApp.document.getElementsByTagName("img")

For i = 0 To myImages.Length - 1
With myImages(i)
Debug.Print "Image # " & i & vbTab & .alt & " - " & .href
End With
Next i


End Sub


Thanks in advance for any insights.

Shred

stanl
04-06-2009, 10:52 AM
I normally use XMLHTTP, especially the free version from xStandard. You might also consider the URLDownloadToFile API which is quick and easy. Stan

Shred Dude
04-06-2009, 12:44 PM
Stan:

Thanks so much for the pointer. I was successful with the URLDownloadtofile method for the site I was working with since each image had a unique URL listed within the document's HTML. Had this not been the case, I'm not sure I could've achieved my desired result.

Modified code example here:



Public Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Public Sub GetImages()
'References to:
' HTML Object Library
' M$ Internet Controls

'Dim IEApp As Object
Dim URL As String

Dim myImages As IHTMLElementCollection


Set IEApp = New internetexplorer

URL = "www.XYZ.com"

With IEApp
.navigate URL
.Visible = True

While .Busy
DoEvents
Wend

End With

Set myImages = IEApp.document.getElementsByTagName("img")

For i = 0 To myImages.Length - 1
With myImages(i)
'Debug.Print "Image # " & i & vbTab & .alt & " - " & .href


If InStr(1, .alt, "ABCD") > 0 Then
returnValue = URLDownloadToFile(0, .href, "C:\WhereIWantIt\"ABCD.jpg", 0, 0)
End If

End With

Next i

Had the page I'm pulling images from not provided a clean HREF that isolated the image on a unique URL, I'm not sure I could've been successful with the above approach.

I'm curious about the XMLHTTP approach you referred to as an alternative. I found several posts regarding XMLHTTP, but none I could readliy apply to my situation.

Any more pointers?

Thanks again for your help.

Shred

stanl
04-06-2009, 01:27 PM
Stan:
I'm curious about the XMLHTTP approach you referred to as an alternative. I found several posts regarding XMLHTTP, but none I could readliy apply to my situation.

Any more pointers?

Thanks again for your help.

Shred

This is Winbatch code... wouldn't take much to convert to VBA... but you can see the point (I pretty much use xstandard to get maps and place them into routing sheets)


;should save about 11 .gif files
cURL="www.winbatch.com (http://www.winbatch.com/)"
oIE = CreateObject("InternetExplorer.Application")
oIE.visible = 1
oIE.navigate(cURL)
If ! ieReady() Then Exit
cHlinks = oIE.Document.Body.GetElementsByTagName("IMG")
If cHlinks
For z = 0 To cHlinks.length-1
S = cHlinks.item(z)
cImage = S.src
n=ItemCount(cImage,"/")
saveImage = dirscript():ItemExtract(n,cImage,"/"))
oHTTP = CreateObject("XStandard.HTTP")
oHTTP.GET(cImage)
oHTTP.SaveResponseToFile(saveImage)
oHTTP=0
Next
Endif
oIE.Quit()
oIE=0
Exit

Shred Dude
04-06-2009, 11:15 PM
Stan:

Thanks again.

I've now discovered that I can use URLDownloadtofile without having to start up IE, given the URL. That's tremendous!

Given my success with URLdownloadtofile, I haven't explored XMLHTTP too much yet. Read a few threads, and tried some sample code. I'll dig into it more later.

Now I'm curious about processing an HTML file saved with URLdownloadtofile without needing to open IE. For example, I"d like to be able to get to the innertext property of a particular DIV on a page. I can do it by opening the file in IE and then using the .document object to get at the data needed. Just wondering if I can do it without having to incur the overhead of IE.

Off topic I know...I'll be looking to other threads.

stanl
04-07-2009, 03:02 AM
Now I'm curious about processing an HTML file saved with URLdownloadtofile without needing to open IE. For example, I"d like to be able to get to the innertext property of a particular DIV on a page. I can do it by opening the file in IE and then using the .document object to get at the data needed. Just wondering if I can do it without having to incur the overhead of IE.


You can use the Shell.Explorer object to parse saved URL files with DHTML. Assuming you used URLDownloadToFile to save www.vbaexpress.com (http://www.vbaexpress.com) to c:\temp\vbaexpress.htm, something like


cURL = "c:\temp\vbaexpress.htm"
cURL = "file:\\\ (file:///)" & cURL
oShell = CreateObject("Shell.Explorer")
oShell.Navigate cURL


at which point you have access to the oShell.document model... not exactly the same as IE, but has enough of the basics to get you by.

You can achieve the same thing with xstandard.http, or MSXML2.XMLHTTP or WinHttp.WinHttpRequest.5.1 (the latter 2 come with XP, I like xstandard because it come bundled with Tidy and permits saving web pages as well-formed XML). Of course at that point you have a bit of a learning curve as you get to dive into the MSXML2.DOMDocument.4.0 model.


Stan

Shred Dude
04-07-2009, 10:06 PM
Stan:

Thanks for the tip on the Shell object. Never used it in this context, I'll give that a try.

I played around with a scripting.filesystemobject to get at a saved .htm file. That worked pretty well, it was fast at least.

What I'm hoping to do is strip a table from a certain page and then display just that table when I need to. A combination of URLdownloadtofile and a small FSO procedure got me the results I needed. I could find the table I needed in the file saved by the URLdownloadtofile, and then grab all the lines of the table into a string variable with a loop going until the next, in this case, </div> tag. I then saved that string into a new .htm file. I could then display just that table. However, I lost the nice formatting that the original page had for the table, I think because the class, or CSSClass that is referred to at the start of the <Div> is then not defined.(?)

I'm still going to make some time to look into xmlhttp. Once I realized I could use URLDownloadToFile without incurring the overhead of IE, I was amazed. The speed with which I could grab data from various web sites was dramatically faster.

Shred Dude
04-07-2009, 10:31 PM
Stan:

I didn't have any luck with Shell Object. After making a reference to the Microsoft Shell and Automation controls (shell32.dll), I still found no navigate method, or a document object in the VBE Object browser.

I tried the code you provided, and made some changes, but couldn't get at an .htm file's contents that way. Did I miss something?

Thanks,

Shred

stanl
04-08-2009, 03:19 AM
I didn't have any luck with Shell Object.
Shred

You have to create a 'container' for it. Or use the MSHTML Object. I just finished a quickie project where I was asked to go through 240,000 customers, extract unique zipcodes and determine if DSL was available in that region. I used an HTTP get, and like you extracted just the table data I needed, but preserved much of the appearance as well as the actual table links. This data was saved in an Access memo field so users could select a zip from a drop-down and display results locally w/out going to the actual web site. (but the links were there if they needed them). I'm not sure URLFileDownload would have worked repeatedly in that case.