PDA

View Full Version : VBA MACRO get element as ID not working-Data Scrapping



pedrosc
04-30-2020, 03:59 AM
Dear all,

I am trying to create a sub that when I have a list of links, access these links and removes a field in the html code and puts in in excel. For some reason I can't understand this is giving an error message on the line I put for get elementID.

The error is "Object variable or with block variable not set".

PS: the way I structured this macro you have to select the cells with links otherwise it won't work. They should be put in column B


Sub webiteration()



Dim nr As Integer
Dim i As Integer
Dim c As Integer
Dim IE As Object
Dim doc As HTMLDocument
Dim myurl As String
Dim sponsor As String




nr = Selection.Rows.Count
c = 1
Set IE = CreateObject("InternetExplorer.Application")


For i = 2 To nr + 1
If ActiveCell <> 0 Then
myurl = ThisWorkbook.Sheets("Sheet1").Range("B" & i).Value

IE.Navigate myurl
IE.Visible = True

sponsor = doc.getElementById("sponsor").innerText
ThisWorkbook.Sheets("Sheet1").Range("C" & i).Value = sponsor

End If
ActiveCell.Offset(1, 0).Select

Next i
End Sub





Examples of links I want to use are here


https://ClinicalTrials.gov/show/NCT04134676

https://ClinicalTrials.gov/show/NCT04235296

Bob Phillips
04-30-2020, 04:31 AM
I am no expert on web scraping, but if you change


sponsor = doc.getElementById("sponsor").innerText

to


sponsor = IE.document.getElementById("sponsor").innerText

you should get more joy

Bob Phillips
04-30-2020, 04:40 AM
Here is better code, doesn't need the urk=ls in column B to be selected, and selects nothing itself


Sub webiteration()
Dim IE As Object
Dim doc As HTMLDocument
Dim myurl As String
Dim sponsor As String
Dim nr As Long
Dim i As Long

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

With ThisWorkbook.Sheets("Sheet2")

nr = .Cells(.Rows.Count, "B").End(xlUp).Row

For i = 2 To nr

If .Cells(i, "B").Value <> vbNullString Then

IE.Navigate .Cells(i, "B").Value
Set doc = IE.document
sponsor = doc.getElementById("sponsor").innerText
.Cells(i, "C").Value = sponsor
End If
Next i
End With

IE.Quit
Set IE = Nothing
End Sub

pedrosc
04-30-2020, 04:52 AM
I didn't. Where do you get this error?

Bob Phillips
04-30-2020, 05:10 AM
Sorry, it looks like I edited your post rather than adding anew.

I have added a loop to check the page is loaded., Please try this and let me know if and where it fails.


Sub webiteration()
Dim IE As Object
Dim doc As HTMLDocument
Dim ready As Boolean
Dim myurl As String
Dim sponsor As String
Dim nr As Long
Dim i As Long

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

With ThisWorkbook.Sheets("Sheet2")

nr = .Cells(.Rows.Count, "B").End(xlUp).Row

For i = 2 To nr

If .Cells(i, "B").Value <> vbNullString Then

IE.Navigate .Cells(i, "B").Value
ready = False
Do

If IE.readyState <> 4 Then

DoEvents
Else

ready = True
End If
Loop Until ready
Set doc = IE.document
sponsor = doc.getElementById("sponsor").innerText
.Cells(i, "C").Value = sponsor
End If
Next i
End With

IE.Quit
Set IE = Nothing
End Sub

pedrosc
04-30-2020, 06:19 AM
It worked now thank you! I guess it is the reference of the sheet2 while I was actually using sheet 1

I just wanted to really understand why my code did not work :(

Bob Phillips
04-30-2020, 06:54 AM
Sorry, that was my fault, I tested on a workbook with Sheet2.

Your code wasn't working because you referenced a variable doc that you didn't set anywhere, I added in IR.document as the main change, but I did a lot of code tidying as well, then I added the wait.