PDA

View Full Version : [SOLVED:] Macro Optimization



Ben.N
06-18-2011, 12:00 PM
Hello,

I wrote the following macro which opens a webpage and compares a given URL with a URL in the page. There are about 1250 webpages to check, so this script takes really long time to run... (about ~10 secs per link = 3.5 hours)

Is there any way to optimize this script so it runs more quickly?

Thanks!


Option Explicit

Public Enum IE_READYSTATE
Uninitialised = 0
Loading = 1
Loaded = 2
Interactive = 3
complete = 4
End Enum

Sub TestOutboundURLs()
Dim ie As Object
Dim doc As HTMLDocument
Dim links As IHTMLElementCollection
Dim link As HTMLAnchorElement
Dim URLExists As Boolean
Dim BaseURL As String
Dim CurrentURL As String
Dim SearchURL As String
Dim I As Integer
Dim J As Integer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
BaseURL = "Some URL"
For J = 2 To 2
CurrentURL = BaseURL & Range("A" & J).Value
SearchURL = Range("B" & J).Value
ie.Navigate CurrentURL
Do Until ie.ReadyState = IE_READYSTATE.complete: DoEvents:
Loop
Set doc = ie.Document
Set links = doc.getElementsByTagName("A")
URLExists = False
I = 0
While I + 1 < links.Length And Not URLExists
Set link = links(I)
If InStr(link.href, SearchURL) Then
URLExists = True
Range("C" & J).Value = "Found"
End If
I = I + 1
Wend
If Not URLExists Then Range("C" & J).Value = "Not Found"
Next J
ie.Quit
Set links = Nothing
End Sub

Shred Dude
06-18-2011, 01:45 PM
Couple of basics... minimize the amount of times you read and write from the sheet. For 1250 rows, you're reading 2500 times in your current code, and writing 1250 times.

Always turn off screen updating to increase speed.

And assuming the site you're working with can be read with this method, use the XMLHTTP object instead of automating Internet Explorer. It will be orders of magnitude faster.

Here's how I'd approach it...


Public Sub TestOutboundURLs_XMLHTTP()
'---------------------------------------------------------------------------------------
' Procedure : TestOutboundURLs_XMLHTTP
' Author : Shred Dude
' Date : 6/18/2011
' Purpose :
'---------------------------------------------------------------------------------------
Dim ie As Object
Dim doc As HTMLdocument
Dim links As IHTMLElementCollection
Dim link As HTMLAnchorElement
Dim URLExists As Boolean
Dim BaseURL As String
Dim CurrentURL As String
Dim SearchURL As String
Dim i As Integer
Dim J As Integer
Dim rngURLS As Range
Dim httpreq As Object
Dim arrResults As Variant
'Instantiate an XMLHTTP object
Set httpreq = CreateObject("Microsoft.XMLHTTP")
'Instantiate an HTML Document to use as a shell for each URL's Source Code
'(Might not be necessary, see notes below)
'** Reference to Microsoft HTML Object Library required
Set doc = New HTMLdocument
'Instead of reading from the sheet 1250 x 2 times
'Read all the values into a range object up front
Set rngURLS = ActiveSheet.Range("a2:b1252")
ReDim arrResults(1 To rngURLS.Rows.Count)
'Turn off Screen Updating
Application.ScreenUpdating = False
BaseURL = "Some URL"
For i = 1 To rngURLS.Rows.Count
CurrentURL = BaseURL & rngURLS(i, 1).Value
SearchURL = rngURLS(i, 2).Value
URLExists = False
Application.StatusBar = "Processing: " & CurrentURL
'Instead of using IE, use the XMLHTTP object to "GET" the HTML Source for each URL
With httpreq
.Open "GET", CurrentURL, False
.Send
doc.body.innerHTML = .responseText
End With
'Without seeing the Source Code, not sure if just searching the whole page would be appropriate
'if so, would probably be much faster over 1250 urls
'If InStr(doc.body.innerHTML, SearchURL) Then URLExists = True
'for that matter, you could probably skip the whole HTMLDocument and just do a search on the string returned by the "GET"
'you'll have to determine if that's appropriate given what you're searching for and whether it could
'be on the page in circumstance you're not interested in confirming
''eg: put this up in the With block above...
'If InStr(.responseText, SearchURL) Then URLExists = True
'If the above suggestions aren't appropriate, then proceed with looping method below
'to isolate confirmation os SearchURL within an anchor element
'(probably slower than methods suggested above)
Set links = doc.getElementsByTagName("A")
For Each link In links
If InStr(link.href, SearchURL) Then
URLExists = True
Exit For
End If
Next link
' 'capture result to sheet (slower, even with screen updating off)
' If URLExists Then
' rngURLS(I, 2).Offset(0, 1).Value = "Found"
' Else
' rngURLS(I, 2).Offset(0, 1).Value = "Not Found"
' End If
'or...capture result to an array and then write the array to the sheet at the end all at once (fastest)
If URLExists Then arrResults(i) = "Found" Else arrResults(i) = "Not Found"
Next i
'write Results to sheet if doing this method
rngURLS(1, 1).Offset(0, 2).Resize(UBound(arrResults, 1), 1).Value = Application.WorksheetFunction.Transpose(arrResults)
'clean up
Set doc = Nothing
Set rngURLS = Nothing
Set links = Nothing
Set httpreq = Nothing
Application.StatusBar = False
Application.ScreenUpdating = True
End Sub

nir.ben
06-24-2011, 09:59 AM
Hi Shred Dude.

Sorry for my late reply. Just wanted to let you know that your version works fantastically! It was very helpful for me not only because it improved the performance tremendously, but also because thanks to it I learned lots of new stuff.

So thanks again,
Ben from Israel