PDA

View Full Version : Search a table on a website for a word & extract the 3rd cell to the Right of it.



cfernand74
07-05-2014, 01:23 PM
i am trying to eliminate some steps i have to do everyday, which is extract data from different city records websites.
i know how to extract tables to excel and manipulate the data once there, but i figured there must be a way to extract the specific data without extracting all the data. I don't mind if it is held in memory. I just don't want to extract all table data to a sheet.

Simply i want to search for the words "Finished Attic" that is located on a website table and extract the info in a cell that is three cells to the right of it which is "293".

There are many tables on this website but the one we will be working with is table 8



Code
Description
Gross
Area
Living
Area


BAS
First Floor
975
975


FUS
Finished Upper Story
975
975


FAT
Finished Attic
975
293


BAY
Bay Window
54
0


STP
Stoop
65
0


UBM
Basement
1002
0




4046
2243




I know how to extract the specific cell info "293" and i would do this, but the issue is that "Finished Attic" position moves up or down depending on the property criteria. In this case it is in the 5th row. The word its self "Finished Attic" is always the same(constant) and obviously the numbers i need to extract changes(two cells to the right of "Finished Attic"). And this goes for lots of other terms i need to extract, but i just need a sample code for one and i can take it from there.

The difference between were the info lies is ("td")(1) for "Finished Attic and ("td")(3) for "293"
i can get to "Finished Attic" with this code
Range("A26").Value = IE.document.forms("form1").getElementsByTagName("table")(8).getElementsByTagName("tr")(3).getElementsByTagName("td")(1).innerText

i can get to "293" with this code
Range("A22").Value = IE.document.forms("form1").getElementsByTagName("table")(8).getElementsByTagName("tr")(3).getElementsByTagName("td")(3).innerText

Know what i would like to do is search the table on the website for the key word "Finished Attic" wherever it is located on the table and move two cells to the right and extract that data. Which in this case would be "293".


This is the code i have setup to get me to the website and try grab info. Please edit as necessary.

Sub TaxAssessors()
Dim oElement As Object




'to refer to the running copy of Internet Explorer
Dim IE As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set IE = New InternetExplorer
IE.Visible = False
IE.navigate "http://gis.vgsi.com/providenceri/Parcel.aspx?Pid=13212"
'Wait until IE is done loading page
Do While IE.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Tax Assessors ..."
DoEvents
Loop




'--------------------------------Code Starts here
'------Test to extract word "Finished Attic" just to see if in right location
Range("A1").Value = IE.document.forms("form1").getElementsByTagName("table")(8).getElementsByTagName("tr")(3).getElementsByTagName("td")(1).innerText


'------Extract Finished Attic sq/ft (data i am after)
Range("A2").Value = IE.document.forms("form1").getElementsByTagName("table")(8).getElementsByTagName("tr")(3).getElementsByTagName("td")(3).innerText
'--------------------------------Code ends here
IE.Quit
End Sub



What i was thinking is to search all the tables in the website page for the word "Finished Attic" once found then select that table and row. In prior code I think of ("tr")(3) as the row and ("td")(3) as the column. Obviously these position change, so maybe ("tr")(x) is identified then add ("td")(3) to it to get value i am after.

What a puzzle?

westconn1
07-05-2014, 04:24 PM
What i was thinking is to search all the tables in the website page
if finished attic is always in the same table, you can eliminate searching other tables if not then expand the code to search all tables

try like

for each rw in IE.document.forms("form1").getElementsByTagName("table")(8).getElementsByTagName("tr")
if instr(rw.innertext, "Finished Attic") > 0 then msgbox rw,getelementsbytagname("td")(3).innertext: exit for
next

cfernand74
07-05-2014, 04:46 PM
Gives me an error. i placed in between code start here and ends here. Is it missing something, i receive:
Compile error:
Sub or Function not defined

westconn1
07-05-2014, 04:58 PM
looks like i have a , where i should have .
rw.getelementsbytagname("td")(3).innertext: exit For

cfernand74
07-05-2014, 11:41 PM
thanks that works great, you said to expand search if want to check tables. I would need to search all tables first then designate that table if the word matches and then extract info.
i have to do this for many different websites. if you can provide code to check tables would be a time saver. If not i have to identify the table for every website which would be time consuming since there are so many city tax assessors website.
So my procedure would be to search all tables and find what table holds key word being searched then select that table and cell that hold specific key word then select third cell to the right and extract that info.
thanks again

westconn1
07-06-2014, 12:02 AM
try like

for each t in IE.document.getElementsByTagName("table")
for each rw in t.getelementsbytagname("tr")
If instr(rw.innertext, "Finished Attic") > 0 Then msgbox rw,getelementsbytagname("td")(3).innertext: fnd = true:exit for
next
if fnd then exit for
next i figured not all websites have the same forms so eliminated possible problems from specifying a form, though the code may take longer (minimal) to run

snb
07-06-2014, 03:19 AM
Sub M_snb()
With CreateObject("MSXML2.XMLHTTP")
.Open "get", "http://gis.vgsi.com/providenceri/Parcel.aspx?Pid=13212"
.send
sn = Split(.responsetext, "Finished Attic")
End With

For j = 1 To UBound(sn)
MsgBox Trim(Split(Split(sn(j), "</td><td>")(2), vbCrLf)(1))
Next
End Sub

cfernand74
11-03-2014, 03:56 PM
1)I was able to implement both westconn1 and snb codes and they both work. In this response i a commenting on westconn1 since i can understand it better. I can't run the same code twice in a row without the second code obtaining no value. The value is there on the website, but it comes back blank. I have tried to reset t and rw values to Nothing, assuming this was the problem, but did not work. I need it to run code again against the whole webpage and find the next value "First Floor".

The plan is to have this code run several times and gather all data needed and assign each to a cell on my excel sheet, but after assigning the first code and seeing # in my excel sheet the next one is blank.

2)In some cases like the one we are working on now i will need to add "First Floor" and "Finished Attic" Sq/ft. So far i have been able to assign the word FinishedAttic to the value found.

The next step after being able to assign these value to the same respective word is to add them and transfer that value to an excel sheet cell for example : currentWb.Sheets("sheet1").Range("d12")

Here is the code i am working with:

'--------------------------------Search another table , select it and find value "First Floor"
For Each t In IE.Document.getElementsByTagName("table")
For Each rw In t.getElementsByTagName("tr")
If InStr(rw.innerText, "First Floor") > 0 Then currentWb.Sheets("sheet1").Range("d10") = rw.getElementsByTagName("td")(3).innerText: fnd = True: Exit For
' If InStr(rw.innerText, "First Floor") > 0 Then FirstFloor = rw.getElementsByTagName("td")(3).innerText: fnd = True: Exit For


Next
If fnd Then Exit For
Next
'MsgBox FirstFloor
Set t = Nothing
Set rw = Nothing
'--------------------------------Search another table , select it and find value "Finished Upper Story"
For Each t In IE.Document.getElementsByTagName("table")
For Each rw In t.getElementsByTagName("tr")
If InStr(rw.innerText, "Finished Upper Story") > 0 Then currentWb.Sheets("sheet1").Range("d11") = rw.getElementsByTagName("td")(3).innerText: fnd = True: Exit For
'If InStr(rw.innerText, "Finished Upper Story") > 0 Then FinishedUpperStory = rw.getElementsByTagName("td")(3).innerText: fnd = True: Exit For

Next
If fnd Then Exit For
Next
'MsgBox FinishedUpperStory
Set t = Nothing
Set rw = Nothing

westconn1
11-05-2014, 02:34 AM
unless you have assigned a workbook object to currentwb elsewhere, it is not valid
active workbook and thisworkbook are both valid workbook objects, else assign a workbook to currentwb
you would need to assign false to fnd before running through another table, else it will exit immediately

it should be possible to loop through the tables once and extract all required values in a single pass