PDA

View Full Version : ie.navigate to follow links from A1 to A2000



nychay
12-14-2016, 05:26 AM
Hi,

I have low programming skills and cannot solve this one myself.

I have found VBA code online to extract email from website, but it only follows website written in the VBA code.
How is it possible to use ie.navigate to follow all links from A1 to A1000 and in column B to paste corresponding email.

Here is the code I found.
P.S. I got it from quite a popular resource so you might have came across it already. I do not claim the code is mine.


Sub scrapeHyperlinksWebsite()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim Link As Object
Dim ElementCol As Object
Dim erow As Long
Application.ScreenUpdating = False
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "WEBSITE"
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = “Loading website…”
DoEvents
Loop
Set html = ie.document
‘Range(“A2”) = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName(“a”)
‘Set ElementCol = html.getElementsByTagName(“mailto:”)
For Each Link In ElementCol
If InStr(Link, “mailto:”) Then
erow = Worksheets(“Sheet1”).Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1) = Right(Link, Len(Link) – InStr(Link, “:”))
Cells(erow, 1).Columns.AutoFit
End If
Next
Set ie = Nothing
Application.StatusBar = “”
Application.ScreenUpdating = True
End Sub


Thank you for attention.
If you manage to know the right answer it would be very appreciated if you could paste it here or give me some hints.

Regards,
Alex

p45cal
12-14-2016, 06:43 AM
Something like the following, but I haven't paid attention to how you derive what goes into the cells, nor have I tested it:
Sub scrapeHyperlinksWebsite()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim Link As Object
Dim ElementCol As Object
Dim erow As Long
Application.ScreenUpdating = False
Set ie = New InternetExplorer
ie.Visible = False
For Each cll In Range("A2:A2000").cells '<<adjust
ie.navigate cll.Hyperlinks(1).Address
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading website…"
DoEvents
Loop
Set html = ie.document
'Range("A2") = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName("a")
'Set ElementCol = html.getElementsByTagName("mailto:")
colm = 1
For Each Link In ElementCol
If InStr(Link, "mailto:") Then

'only one of these next 2 lines:
cll.Offset(, colm).Value = Link
cll.Offset(, colm).Value = Right(Link, Len(Link) - InStr(Link, ":"))

colm = colm + 1
End If
Next Link
Next cll
Set ie = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub

nychay
12-14-2016, 07:20 AM
Hi p45cal

Thanks for your reply.
It appears that your edited code works but only for like 5 links.

When I put in more i gives me error and highlights the following line.

ie.navigate cll.Hyperlinks(1).Address

Any ideas?

p45cal
12-14-2016, 07:48 AM
When it stops there, look for a window pane in the Visual Basic Editor (VBE) titled 'Immediate'. If it's not visible, bring it up with Ctrl+G on the keyboard. While the yellow highlight is still highlighting where the code stopped, type in the Immediate pane:
?cll.Hyperlinks(1).Address
(yes, with the preceding question mark) and press Enter
It should show you the hyperlink address it's trying to navigate to. Is it a valid address? (Can you copy/paste it into a browser and it goes OK?). If you see nothing at all, does the cell actually have a hyperlink in (it might be just a text string in the cell and not a hyperlink)?
You can show on the sheet which cell it's currently looking at with the command:
cll.select
and Enter. This will select that cell on the sheet.
(All the above while the yellow highlight is still showing, by the way.)

nychay
12-14-2016, 08:04 AM
Thanks p45cal.

That worked great.

Regards,
Alex

p45cal
12-14-2016, 08:30 AM
So what was wrong?

nychay
12-15-2016, 12:59 AM
It was that hyperlinks in excel were not hyperlinks in the usual meaning of it but rather a command string = hyperlink(Cell).

Once I typed in the hyperlink itself... no problems.

nychay
12-15-2016, 05:15 AM
Unfortunately it looks like this code cannot comprehend too many operations. First 100 links is ok after I open the project.
Then an error appears - Permission denied with the following string highlighted:

If InStr(Link, "mailto:") Then

Could it be that my computer is too slow for processing all 2000 links ?

p45cal
12-15-2016, 05:28 AM
Post the actual link it falls over at and post your current code.
You can do a little debugging yourself: find which cell it is it falls over at, let's say it's cell A333
then change the line in the macro temporarily from:
For Each cll In Range("A2:A2000").cells '<<adjust
to:
For Each cll In Range("A333:A2000").cells '<<adjust
and try again.
You can also step through code one line at a time with F8 on the keyboard

nychay
12-15-2016, 05:38 AM
I've been doing exactly that.
At the moment I am just going through the links with step of 50.

For Each cll In Range("A1:50").cells '<<adjust

and etc...

If it gives the error

If InStr(Link, "mailto:") Then

Then I just repeat it for same cell range and every time it breaks at a different link. Not the one it stopped at previously.

So I am not too sure if pasting a particular links will help.

From now I have observed that this error comes up completely random. Cannot trace similarities.

p45cal
12-15-2016, 06:30 AM
OK, well post your current code at least and then maybe one or two sample hyperlinks so that I can test what's happening and perhaps try another route, at the moment I'd be working in the dark. Unless of course, you're scraping the web when you're not supposed to be.

nychay
12-16-2016, 12:28 AM
Sub scrapeHyperlinksWebsite() Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim Link As Object
Dim ElementCol As Object
Dim erow As Long
Application.ScreenUpdating = False
Set ie = New InternetExplorer
ie.Visible = False
For Each cll In Range("B1:B50").Cells '<<adjust
ie.navigate cll.Hyperlinks(1).Address
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading website…"
DoEvents
Loop
Set html = ie.document
'Range("A2") = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName("a")
'Set ElementCol = html.getElementsByTagName("mailto:")
colm = 1
For Each Link In ElementCol
If InStr(Link, "mailto:") Then

'only one of these next 2 lines:
cll.Offset(, colm).Value = Link
cll.Offset(, colm).Value = Right(Link, Len(Link) - InStr(Link, ":"))

colm = colm + 1
End If
Next Link
Next cll
Set ie = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub



This would be the code.

I am sending URLs to your private chat just for security.

p45cal
12-16-2016, 12:45 PM
Not only Permission denied, but also skipping over links altogether.
I tried all sorts of things, including MSXML2.60, but I only got "We notice you are using an outdated version of Internet Explorer. This version is not supported by *******."
I tried including a .busy element to the code (still there) but that didn't work, so now I've been lazy and just added a 1 second delay which seems to allow the page to load. You may need to tweak that time as your web bandwidth may be better than mine. Too short a delay and it will not come up with a result - exactly the same as if there is no address to return, unfortunately.
You may also have run into resource problems, because each time you run the macro, you'd create a new instance of ineternet explorer, but the previous instance hadn't been shut down (and was still invisible), so you'd end up with many invisible instances of Internet Explorer. I've added a .Quit line to stop that happenng.

So the code will run a lot slower now, but hopefully you'll get your values.

Sub scrapeHyperlinksWebsite()
Dim IE As InternetExplorer
Dim HTML As HTMLDocument
Dim Link As Object
Dim ElementCol As Object
Dim erow As Long

Application.ScreenUpdating = False
Set IE = New InternetExplorer
IE.Visible = False
myCount = 0
For Each cll In Range("B1:B14").Cells '<<adjust
'cll.Select
myCount = myCount + 1
IE.navigate cll.Hyperlinks(1).Address
Application.StatusBar = "Loading website " & myCount & "…"
Do Until IE.Busy = False: DoEvents: Loop
Do Until IE.readyState = 4: DoEvents: Loop
Application.Wait DateAdd("s", 1, Now)
Set HTML = IE.document
Set ElementCol = HTML.getElementsByTagName("a")
colm = 1
For Each Link In ElementCol
If InStr(Link, "mailto:") Then
'only one of these next 2 lines:
'cll.Offset(, colm).Value = Link
cll.Offset(, colm).Value = Mid(Link, InStr(Link, ":") + 1) 'Right(Link, Len(Link) - InStr(Link, ":"))
colm = colm + 1
End If
Next Link
Next cll
IE.Quit
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub