Pedrolito
03-29-2017, 06:19 AM
Hi guys,
I am new and need some help on how to parse the wikipedia infobox table (this small grey table that you see each time in the right of the wiki result page).
Exemple with the Wiki Walmart page:
I need to get the "Headquarters" and "Area served" values of the table without telling a specific item position but just with the header title (Headquarters...). Indeed, depending of the company page the position of the item in the table change each time (dynamic table).
HTML of the TH, TD, TR researched: <tr> <th scope="row" style="padding-right:0.5em;">Headquarters</th> <td class="label" style="line-height:1.35em;"><a href="/wiki/Bentonville,_Arkansas" title="Bentonville, Arkansas">Bentonville, Arkansas</a>, U.S.</td>
</tr> <tr> <tr> <th scope="row" style="padding-right:0.5em;"> <div style="padding:0.1em 0;line-height:1.2em;">Area served</div> </th> <td style="line-height:1.35em;">Worldwide</td> </tr>
Do you have any idea of how I could do that ? I tried a For Each TD in TH If TH.innertext = "Headquarters" Then Cells(x,y).value = TD.innertext ...but it didn't worked
Thanks a lot !!!! :-)
Sub testOptimize()
Dim IE As Object
Dim form As Variant
Dim button As Variant
Dim LR As Integer
Dim var As String
On Error Resume Next
'LR = Cells(Rows.Count, 1).End(xlUp).Row
For x = 2 To 6
If Range("E" & x).Value <> vbNullString Then
var = ActiveSheet.Cells(x, 7).Value
Set IE = CreateObject("internetexplorer.application")
IE.Visible = False
With IE
.Visible = False
.navigate "https://en.wikipedia.org/wiki/" & var
While Not .readyState = READYSTATE_COMPLETE
Wend
End With
'Wait some to time for loading the page
While IE.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
'wait for page to load
While IE.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
Set elementONE = IE.Document.getElementsByTagName("TH")
For i = 0 To 20
elementTWO = elementONE.Item(i).innerText
If elementTWO = "Headquarters" Then
Range("H1").Offset(x, 8) = IE.Document.getElementsByTagName("TD").Item(i).innerText
Exit For
End If
Next i
DoEvents
IE.Quit
Set IE = Nothing
End If
Next x
End Sub
I am close to having the solution I think... :-) :
Set elementONE = IE.Document.getElementsByTagName("TH")
For i = 0 To 20
elementTWO = elementONE.Item(i).innerText
If elementTWO = "Headquarters" Then
MsgBox (i)
Range("H1").Offset(x, 8) = IE.Document.getElementsByTagName("TD").Item(i).innerText
Exit For
End If
Next i
I am new and need some help on how to parse the wikipedia infobox table (this small grey table that you see each time in the right of the wiki result page).
Exemple with the Wiki Walmart page:
I need to get the "Headquarters" and "Area served" values of the table without telling a specific item position but just with the header title (Headquarters...). Indeed, depending of the company page the position of the item in the table change each time (dynamic table).
HTML of the TH, TD, TR researched: <tr> <th scope="row" style="padding-right:0.5em;">Headquarters</th> <td class="label" style="line-height:1.35em;"><a href="/wiki/Bentonville,_Arkansas" title="Bentonville, Arkansas">Bentonville, Arkansas</a>, U.S.</td>
</tr> <tr> <tr> <th scope="row" style="padding-right:0.5em;"> <div style="padding:0.1em 0;line-height:1.2em;">Area served</div> </th> <td style="line-height:1.35em;">Worldwide</td> </tr>
Do you have any idea of how I could do that ? I tried a For Each TD in TH If TH.innertext = "Headquarters" Then Cells(x,y).value = TD.innertext ...but it didn't worked
Thanks a lot !!!! :-)
Sub testOptimize()
Dim IE As Object
Dim form As Variant
Dim button As Variant
Dim LR As Integer
Dim var As String
On Error Resume Next
'LR = Cells(Rows.Count, 1).End(xlUp).Row
For x = 2 To 6
If Range("E" & x).Value <> vbNullString Then
var = ActiveSheet.Cells(x, 7).Value
Set IE = CreateObject("internetexplorer.application")
IE.Visible = False
With IE
.Visible = False
.navigate "https://en.wikipedia.org/wiki/" & var
While Not .readyState = READYSTATE_COMPLETE
Wend
End With
'Wait some to time for loading the page
While IE.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
'wait for page to load
While IE.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
Set elementONE = IE.Document.getElementsByTagName("TH")
For i = 0 To 20
elementTWO = elementONE.Item(i).innerText
If elementTWO = "Headquarters" Then
Range("H1").Offset(x, 8) = IE.Document.getElementsByTagName("TD").Item(i).innerText
Exit For
End If
Next i
DoEvents
IE.Quit
Set IE = Nothing
End If
Next x
End Sub
I am close to having the solution I think... :-) :
Set elementONE = IE.Document.getElementsByTagName("TH")
For i = 0 To 20
elementTWO = elementONE.Item(i).innerText
If elementTWO = "Headquarters" Then
MsgBox (i)
Range("H1").Offset(x, 8) = IE.Document.getElementsByTagName("TD").Item(i).innerText
Exit For
End If
Next i