PDA

View Full Version : Solved: Web Query...creating data crawler



DanOfEarth
07-02-2010, 05:34 AM
Hi guys,

I've been googling this for three weeks....can't seem to get it right.

I'm trying to create a web dataminer/crawler. I'm checking a webpage for addresses, one-by-one, off of an address list. The address is broken into two columns, the street # and street name (ie. "2345" and "WindStreet" = 2 cloumns)

I've got the query-part down, however I want it to cycle through an entire address list so I can manipulate the data. I can't seem to get the code right. The working version of the query using only one address is below. However a weak attempt at it using the .Cells(Rows.Count, etc) and a "For Each Cell/Next" statement is commented out below to check the whole list, but it doesn't work. I know the solution is simple.

Please....need help.



Sub Query3()
'Dim Address As Range
'Dim Street As String
Application.ScreenUpdating = False
Address = Sheets("Leads").Range("C2").Value
Street = Sheets("Leads").Range("D2").Value
Const MyUrl As String = "http://search.har.com/engine/doSearch.cfm?QUICKSEARCH="
'With Sheets("Leads")
'Set Address = .Range("C2", .Cells(Rows.Count, "C").End(xlUp))
'Set Street = .Range("D2", .Cells(Rows.Count, "D").End(xlUp))

'End With
'For Each Cell In Address
With Sheets("Import").QueryTables.Add(Connection:= _
"URL;" & MyUrl & Address & " " & Street _
, Destination:=Sheets("Import").Range("Import!$A$1"))
.Name = "doSearch.cfm?QUICKSEARCH=802%20Hallmark%20Oak"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
'Next Cell
Application.ScreenUpdating = True

End Sub

DanOfEarth
07-02-2010, 05:33 PM
Got the solution!

(you guys are asleep).

FYI - Really cool web crawler/data miner. Simple as hell. It checks a list of addresses against this particular website to see if they exist. Only works on a static page search, since it doesn't parse any data, although it could easily.

I just left a sheet called "Import" in the book and let the query rewrite over it over and over again. I then manipulated the data before it rewrote over it. I had to put a timed delay so that it didn't crash the program.


Sub Crawler()
Const MyUrl As String = "http://search.har.com/engine/doSearch.cfm?QUICKSEARCH="
' Set Do loop to stop when an empty cell is reached.
Do Until IsEmpty(ActiveCell)
With Sheets("Leads")
Set Address = ActiveCell
Set Street = ActiveCell.Offset(0, 1)
End With
With Sheets("Import").QueryTables.Add(Connection:= _
"URL;" & MyUrl & Address & " " & Street _
, Destination:=Sheets("Import").Range("Import!$A$1"))
.Name = "doSearch.cfm?QUICKSEARCH=802%20Hallmark%20Oak"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
If Range("Results") = "Search Result: (0) Records Found. " Then
ActiveCell.Offset(0, 5).Formula = "Z - N/A"
ActiveCell.Offset(0, 6).Formula = "Z - N/A"
ElseIf Range("Results") = "Search Result: (1) Records Found. " Then
ActiveCell.Offset(0, 5).Formula = Range("AgentName")
ActiveCell.Offset(0, 6).Formula = Range("AgentFirm")
Else
ActiveCell.Offset(0, 5).Formula = "Z - Townhouse"
ActiveCell.Offset(0, 6).Formula = "Z - Townhouse"
End If

Application.Wait Now + TimeValue("00:00:01")
ActiveCell.Offset(1, 0).Select
Loop
End Sub