PDA

View Full Version : [SOLVED:] Loop through URLs, save source



daojidpjidas
06-01-2016, 09:52 AM
I have about 1000 urls in Excel (A1:A1000). I'd like to open each URL and save the page source. A few issues:

With so many urls, I assume it would be faster to open one instance of IE. Then each url is opened as a new tab. In the loop, close that tab before opening the next.
The page names are all the same (index.html), so they would have to be saved with a loop index as 1.html, 2.html, ... , 1000.html. I imagine that can be done using c.Row in the loop.
To be less taxing on the server and also avoid detection, I'd like to implement some sort of random wait

I'm familiar with wget, but this website only works in IE.


Here is what I have so far


Sub GetHTMLBulk()
Dim IE As Object

'Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")

'Navigate to arbitrary page
IE.Navigate "somearbitrarypage"

For Each c In Worksheets("Sheet1").Range("A1:A1000").Cells
'Open c in new tab
'Save source as n.html
'close tab for c
Next
End Sub


I understand there's not much going on here, as both the new tab operation and save operation are beyond me. Thanks!

I've found


Option Explicit

Sub OpenURLOnNewTab()

Dim lngC As Long
Dim strUrl As String
Dim ieObj As InternetExplorer

Set ieObj = New InternetExplorer
ieObj.Visible = True
strUrl = "https://usefulgyaan.wordpress.com/" 'We have taken one URL but you can take an array of URLs

For lngC = 1 To 100
ieObj.Navigate2 strUrl, 2048 '2048 is to open the URL on the new tab
Next lngC

End Sub

Some modification to make it work with my array, but the main thing now is saving as HTML. Most of the threads I've found on here are saving specific parts, but I'll keep looking. Thanks!

Kenneth Hobs
06-01-2016, 12:40 PM
Welcome to the forum!

There are several ways to do it. See my post #9 for one method. http://www.vbaexpress.com/forum/showthread.php?43237

daojidpjidas
06-16-2016, 08:18 AM
Thanks for the welcome!

Worked out great, thanks.