PDA

View Full Version : Need Help - VBA HTML Wikipedia data



Pedrolito
03-29-2017, 06:19 AM
Hi guys,
I am new and need some help on how to parse the wikipedia infobox table (this small grey table that you see each time in the right of the wiki result page).
Exemple with the Wiki Walmart page:
I need to get the "Headquarters" and "Area served" values of the table without telling a specific item position but just with the header title (Headquarters...). Indeed, depending of the company page the position of the item in the table change each time (dynamic table).



HTML of the TH, TD, TR researched: <tr> <th scope="row" style="padding-right:0.5em;">Headquarters</th> <td class="label" style="line-height:1.35em;"><a href="/wiki/Bentonville,_Arkansas" title="Bentonville, Arkansas">Bentonville, Arkansas</a>, U.S.</td>
</tr> <tr> <tr> <th scope="row" style="padding-right:0.5em;"> <div style="padding:0.1em 0;line-height:1.2em;">Area served</div> </th> <td style="line-height:1.35em;">Worldwide</td> </tr>


Do you have any idea of how I could do that ? I tried a For Each TD in TH If TH.innertext = "Headquarters" Then Cells(x,y).value = TD.innertext ...but it didn't worked
Thanks a lot !!!! :-)


Sub testOptimize()
Dim IE As Object
Dim form As Variant
Dim button As Variant
Dim LR As Integer
Dim var As String

On Error Resume Next


'LR = Cells(Rows.Count, 1).End(xlUp).Row

For x = 2 To 6
If Range("E" & x).Value <> vbNullString Then


var = ActiveSheet.Cells(x, 7).Value


Set IE = CreateObject("internetexplorer.application")

IE.Visible = False

With IE

.Visible = False

.navigate "https://en.wikipedia.org/wiki/" & var

While Not .readyState = READYSTATE_COMPLETE

Wend

End With

'Wait some to time for loading the page

While IE.Busy

DoEvents

Wend

Application.Wait (Now + TimeValue("0:00:02"))

'wait for page to load

While IE.Busy

DoEvents

Wend

Application.Wait (Now + TimeValue("0:00:02"))


Set elementONE = IE.Document.getElementsByTagName("TH")
For i = 0 To 20
elementTWO = elementONE.Item(i).innerText
If elementTWO = "Headquarters" Then
Range("H1").Offset(x, 8) = IE.Document.getElementsByTagName("TD").Item(i).innerText
Exit For
End If
Next i


DoEvents
IE.Quit
Set IE = Nothing

End If
Next x


End Sub



I am close to having the solution I think... :-) :


Set elementONE = IE.Document.getElementsByTagName("TH")
For i = 0 To 20
elementTWO = elementONE.Item(i).innerText
If elementTWO = "Headquarters" Then
MsgBox (i)
Range("H1").Offset(x, 8) = IE.Document.getElementsByTagName("TD").Item(i).innerText
Exit For
End If
Next i