PDA

View Full Version : Check If URL Returns 404



D_Marcel
10-16-2017, 02:44 PM
Hi Gurus,
Trying to find some algorythm that tests if a URL really exists, I've found this precious piece of code provided by Leith Ross:


'Written: March 15, 2011' Updated: April 29, 2012
' Author: Leith Ross
' Summary: Returns the status for a URL along with the Page Source HTML text.


Public PageSource As String
Public httpRequest As Object


Function GetURLStatus(ByVal URL As String, Optional AllowRedirects As Boolean)


Const WinHttpRequestOption_EnableRedirects = 6


If httpRequest Is Nothing Then
On Error Resume Next
Set httpRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
If httpRequest Is Nothing Then
Set httpRequest = CreateObject("WinHttp.WinHttpRequest.5")
End If
Err.Clear
On Error GoTo 0
End If


' Control if the URL being queried is allowed to redirect.
httpRequest.Option(WinHttpRequestOption_EnableRedirects) = AllowRedirects


' Clear any pervious web page source information
PageSource = ""

' Add protocol if missing
If InStr(1, URL, "://") = 0 Then
URL = "http://" & URL
End If


' Launch the HTTP httpRequest synchronously
On Error Resume Next
httpRequest.Open "HEAD", URL, False
If Err.Number <> 0 Then
' Handle connection errors
GetURLStatus = Err.Description
Err.Clear
Exit Function
End If
On Error GoTo 0

' Send the http httpRequest for server status
On Error Resume Next
httpRequest.Send
httpRequest.WaitForResponse
If Err.Number <> 0 Then
' Handle server errors
PageSource = "Error"
GetURLStatus = Err.Description
Err.Clear
Else
' Show HTTP response info
GetURLStatus = httpRequest.Status & " - " & httpRequest.StatusText
' Save the web page text
PageSource = httpRequest.ResponseText
End If
On Error GoTo 0

End Function


Sub ValidateURLs()


Dim Cell As Range
Dim Rng As Range
Dim RngEnd As Range
Dim Status As String
Dim Wks As Worksheet

Set Wks = ActiveSheet
Set Rng = Wks.Range("A1")

Set RngEnd = Wks.Cells(Rows.Count, Rng.Column).End(xlUp)
If RngEnd.Row < Rng.Row Then Exit Sub Else Set Rng = Wks.Range(Rng, RngEnd)

For Each Cell In Rng
Status = GetURLStatus(Cell)
Cell.Offset(0, 1) = Status
Next Cell

End Sub

However, if the URL gets redirected to another site in case of no exist, this function returns 302 - Found instead of 404 or some error code/message.

Of this two URL's:

https://www.dotz.com.br/Produto?pid=7953596
https://www.dotz.com.br/Produto?pid=7953

The first one exists and the second, which is as test copied from the first, doesn't. Clicking on them from Excel will redirect to a main site. Probably because of this the function is returning 302. Copying and pasting them directly in the browser, will return 404 to the second.

Is there a way to get the 404 to the second, even with this redirection?

Regards,

Douglas Marcel

SamT
10-21-2017, 01:08 PM
bump