Consulting

Results 1 to 13 of 13

Thread: ie.navigate to follow links from A1 to A2000

  1. #1
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location

    ie.navigate to follow links from A1 to A2000

    Hi,

    I have low programming skills and cannot solve this one myself.

    I have found VBA code online to extract email from website, but it only follows website written in the VBA code.
    How is it possible to use ie.navigate to follow all links from A1 to A1000 and in column B to paste corresponding email.

    Here is the code I found.
    P.S. I got it from quite a popular resource so you might have came across it already. I do not claim the code is mine.

    Sub scrapeHyperlinksWebsite()
    Dim ie As InternetExplorer
    Dim html As HTMLDocument
    Dim Link As Object
    Dim ElementCol As Object
    Dim erow As Long
    Application.ScreenUpdating = False
    Set ie = New InternetExplorer
    ie.Visible = False
    ie.navigate "WEBSITE"
    Do While ie.readyState <> READYSTATE_COMPLETE
    Application.StatusBar = “Loading website…”
    DoEvents
    Loop
    Set html = ie.document
    ‘Range(“A2”) = html.DocumentElement.innerHTML
    Set ElementCol = html.getElementsByTagName(“a”)
    ‘Set ElementCol = html.getElementsByTagName(“mailto:”)
    For Each Link In ElementCol
    If InStr(Link, “mailto:”) Then
    erow = Worksheets(“Sheet1”).Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
    Cells(erow, 1).Value = Link
    Cells(erow, 1) = Right(Link, Len(Link) – InStr(Link, “:”))
    Cells(erow, 1).Columns.AutoFit
    End If
    Next
    Set ie = Nothing
    Application.StatusBar = “”
    Application.ScreenUpdating = True
    End Sub

    Thank you for attention.
    If you manage to know the right answer it would be very appreciated if you could paste it here or give me some hints.

    Regards,
    Alex
    Last edited by Aussiebear; 12-15-2016 at 09:45 AM. Reason: Added code tags

  2. #2
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,844
    Something like the following, but I haven't paid attention to how you derive what goes into the cells, nor have I tested it:
    Sub scrapeHyperlinksWebsite()
    Dim ie As InternetExplorer
    Dim html As HTMLDocument
    Dim Link As Object
    Dim ElementCol As Object
    Dim erow As Long
    Application.ScreenUpdating = False
    Set ie = New InternetExplorer
    ie.Visible = False
    For Each cll In Range("A2:A2000").cells  '<<adjust
      ie.navigate cll.Hyperlinks(1).Address
      Do While ie.readyState <> READYSTATE_COMPLETE
        Application.StatusBar = "Loading website…"
        DoEvents
      Loop
      Set html = ie.document
      'Range("A2") = html.DocumentElement.innerHTML
      Set ElementCol = html.getElementsByTagName("a")
      'Set ElementCol = html.getElementsByTagName("mailto:")
      colm = 1
      For Each Link In ElementCol
        If InStr(Link, "mailto:") Then
    
          'only one of these next 2 lines:
          cll.Offset(, colm).Value = Link
          cll.Offset(, colm).Value = Right(Link, Len(Link) - InStr(Link, ":"))
    
          colm = colm + 1
        End If
      Next Link
    Next cll
    Set ie = Nothing
    Application.StatusBar = ""
    Application.ScreenUpdating = True
    End Sub
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  3. #3
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location
    Hi p45cal

    Thanks for your reply.
    It appears that your edited code works but only for like 5 links.

    When I put in more i gives me error and highlights the following line.

    ie.navigate cll.Hyperlinks(1).Address

    Any ideas?

  4. #4
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,844
    When it stops there, look for a window pane in the Visual Basic Editor (VBE) titled 'Immediate'. If it's not visible, bring it up with Ctrl+G on the keyboard. While the yellow highlight is still highlighting where the code stopped, type in the Immediate pane:
    ?cll.Hyperlinks(1).Address
    (yes, with the preceding question mark) and press Enter
    It should show you the hyperlink address it's trying to navigate to. Is it a valid address? (Can you copy/paste it into a browser and it goes OK?). If you see nothing at all, does the cell actually have a hyperlink in (it might be just a text string in the cell and not a hyperlink)?
    You can show on the sheet which cell it's currently looking at with the command:
    cll.select
    and Enter. This will select that cell on the sheet.
    (All the above while the yellow highlight is still showing, by the way.)
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  5. #5
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location
    Thanks p45cal.

    That worked great.

    Regards,
    Alex

  6. #6
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,844
    So what was wrong?
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  7. #7
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location
    It was that hyperlinks in excel were not hyperlinks in the usual meaning of it but rather a command string = hyperlink(Cell).

    Once I typed in the hyperlink itself... no problems.

  8. #8
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location
    Unfortunately it looks like this code cannot comprehend too many operations. First 100 links is ok after I open the project.
    Then an error appears - Permission denied with the following string highlighted:

    If InStr(Link, "mailto:") Then

    Could it be that my computer is too slow for processing all 2000 links ?

  9. #9
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,844
    Post the actual link it falls over at and post your current code.
    You can do a little debugging yourself: find which cell it is it falls over at, let's say it's cell A333
    then change the line in the macro temporarily from:
    For Each cll In Range("A2:A2000").cells '<<adjust
    to:
    For Each cll In Range("A333:A2000").cells '<<adjust
    and try again.
    You can also step through code one line at a time with F8 on the keyboard
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  10. #10
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location
    I've been doing exactly that.
    At the moment I am just going through the links with step of 50.

    For Each cll In Range("A1:50").cells '<<adjust

    and etc...

    If it gives the error

    If InStr(Link, "mailto:") Then

    Then I just repeat it for same cell range and every time it breaks at a different link. Not the one it stopped at previously.

    So I am not too sure if pasting a particular links will help.

    From now I have observed that this error comes up completely random. Cannot trace similarities.

  11. #11
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,844
    OK, well post your current code at least and then maybe one or two sample hyperlinks so that I can test what's happening and perhaps try another route, at the moment I'd be working in the dark. Unless of course, you're scraping the web when you're not supposed to be.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  12. #12
    VBAX Regular
    Joined
    Dec 2016
    Posts
    7
    Location
    Sub scrapeHyperlinksWebsite()    Dim ie As InternetExplorer
        Dim html As HTMLDocument
        Dim Link As Object
        Dim ElementCol As Object
        Dim erow As Long
        Application.ScreenUpdating = False
        Set ie = New InternetExplorer
        ie.Visible = False
        For Each cll In Range("B1:B50").Cells '<<adjust
            ie.navigate cll.Hyperlinks(1).Address
            Do While ie.readyState <> READYSTATE_COMPLETE
                Application.StatusBar = "Loading website…"
                DoEvents
            Loop
            Set html = ie.document
             'Range("A2") = html.DocumentElement.innerHTML
            Set ElementCol = html.getElementsByTagName("a")
             'Set ElementCol = html.getElementsByTagName("mailto:")
            colm = 1
            For Each Link In ElementCol
                If InStr(Link, "mailto:") Then
                     
                     'only one of these next 2 lines:
                    cll.Offset(, colm).Value = Link
                    cll.Offset(, colm).Value = Right(Link, Len(Link) - InStr(Link, ":"))
                     
                    colm = colm + 1
                End If
            Next Link
        Next cll
        Set ie = Nothing
        Application.StatusBar = ""
        Application.ScreenUpdating = True
    End Sub
    This would be the code.

    I am sending URLs to your private chat just for security.
    Last edited by nychay; 12-16-2016 at 05:00 AM.

  13. #13
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,844
    Not only Permission denied, but also skipping over links altogether.
    I tried all sorts of things, including MSXML2.60, but I only got "We notice you are using an outdated version of Internet Explorer. This version is not supported by *******."
    I tried including a .busy element to the code (still there) but that didn't work, so now I've been lazy and just added a 1 second delay which seems to allow the page to load. You may need to tweak that time as your web bandwidth may be better than mine. Too short a delay and it will not come up with a result - exactly the same as if there is no address to return, unfortunately.
    You may also have run into resource problems, because each time you run the macro, you'd create a new instance of ineternet explorer, but the previous instance hadn't been shut down (and was still invisible), so you'd end up with many invisible instances of Internet Explorer. I've added a .Quit line to stop that happenng.

    So the code will run a lot slower now, but hopefully you'll get your values.
    Sub scrapeHyperlinksWebsite()
    Dim IE As InternetExplorer
    Dim HTML As HTMLDocument
    Dim Link As Object
    Dim ElementCol As Object
    Dim erow As Long
    
    Application.ScreenUpdating = False
    Set IE = New InternetExplorer
    IE.Visible = False
    myCount = 0
    For Each cll In Range("B1:B14").Cells  '<<adjust
      'cll.Select
      myCount = myCount + 1
      IE.navigate cll.Hyperlinks(1).Address
      Application.StatusBar = "Loading website " & myCount & "…"
      Do Until IE.Busy = False: DoEvents: Loop
      Do Until IE.readyState = 4: DoEvents: Loop
      Application.Wait DateAdd("s", 1, Now)
      Set HTML = IE.document
      Set ElementCol = HTML.getElementsByTagName("a")
      colm = 1
      For Each Link In ElementCol
        If InStr(Link, "mailto:") Then
          'only one of these next 2 lines:
          'cll.Offset(, colm).Value = Link
          cll.Offset(, colm).Value = Mid(Link, InStr(Link, ":") + 1)  'Right(Link, Len(Link) - InStr(Link, ":"))
          colm = colm + 1
        End If
      Next Link
    Next cll
    IE.Quit
    Set IE = Nothing
    Application.StatusBar = ""
    Application.ScreenUpdating = True
    End Sub
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •