Consulting

Results 1 to 8 of 8

Thread: download company URL's & company names across multiple pages from web (Web scraping)

  1. #1

    download company URL's & company names across multiple pages from web (Web scraping)

    Hi,


    I want to download all company names from the web. I wrote code but one line of code is not working. pls go through the code and correct it


    Note: here my intention is, I want to scrape/download specific URL's from the web page. if URL contain "http://money.rediff.com/companies/", then I want to download that URL. for this I was used instr function. but this function is not working. Pls, help me.

    Sub downloadallcompanynames()
    Dim IE As New SHDocVw.InternetExplorer
    Dim num As Integer
            Dim lrow As Long
            Dim str As String
            Dim alllinks As mshtml.IHTMLElementCollection
            ThisWorkbook.Sheets("Sheet1").Activate
            ActiveSheet.Cells(2, 1).Select
        Set IE = New SHDocVw.InternetExplorer
    IE.Visible = True
        IE.navigate "http://money.rediff.com/companies"
           Do While IE.ReadyState <> READYSTATE_COMPLETE
        Loop
     Set htmldoc = IE.Document
        Set HTMLAs = htmldoc.getElementsByTagName("a")
    For Each HTMLA In HTMLAs
         num = InStr(HTMLA.innerText, "http://money.rediff.com/companies")  'this line of code is not working
         If num > 0 Then
    Debug.Print HTMLA.getAttribute("href")
            ActiveCell.Value = HTMLA.getAttribute("href")
            ActiveCell.Offset(1, 0).Select
    End If
    Next HTMLA
    End Sub
    Pls add another code for download all URL's from multiple pages from above website. i did not added.

    thank you in advance

  2. #2
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location
    you are trying to use the INSTR function on something that isn't a string so results aren't what you want.
    I just put the href into a string then it works.


    Sub downloadallcompanynames()
    Dim IE As New SHDocVw.InternetExplorer
    Dim num As Integer
    Dim lrow As Long
    Dim str As String
    Dim hrefstr As String
    Dim alllinks As MSHTML.IHTMLElementCollection
    ThisWorkbook.Sheets("Sheet1").Activate
    ActiveSheet.Cells(2, 1).Select
    Set IE = New SHDocVw.InternetExplorer
    IE.Visible = True
    IE.navigate "http://money.rediff.com/companies"
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    Set htmldoc = IE.Document
    Set HTMLAs = htmldoc.getElementsByTagName("a")
    For Each HTMLA In HTMLAs
    hrefstr = HTMLA.href
    num = InStr(hrefstr, "http://money.rediff.com/companies") 'this line of code is not working
    If num > 0 Then
    Debug.Print HTMLA.getAttribute("href")
    ActiveCell.Value = HTMLA.getAttribute("href")
    ActiveCell.Offset(1, 0).Select
    End If
    Next HTMLA
    End Sub

  3. #3
    Hi,


    Below code is perfect except small error. i.e. There are 200 URLs in the first page. I downloaded 199 URLs (except first URL) by use of below code. Why first URL not added.

    (first URL - http://money.rediff.com/companies/20...s-Ltd/15110088)

    Sub downloadallcompanynames()
    Dim IE As New SHDocVw.InternetExplorer
    Dim num As Integer
    Dim hrefstr As String
    ThisWorkbook.Sheets("Sheet1").Activate
    ActiveSheet.Cells(2, 1).Select
    Set IE = New SHDocVw.InternetExplorer
    IE.Visible = True
    IE.Navigate "http://money.rediff.com/companies/"
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    Set HTMLDoc = IE.Document
    Set HTMLAs = HTMLDoc.getElementsByTagName("a")
    For Each HTMLA In HTMLAs
    hrefstr = HTMLA.href
    num = InStr(hrefstr, "http://money.rediff.com/companies/")
    If num > 0 And IsNumeric(Right(hrefstr, 8)) Then
    Debug.Print HTMLA.getAttribute("href")
    ActiveCell.Value = HTMLA.getAttribute("href")
    ActiveCell.Offset(1, 0).Select
    End If
    Next HTMLA
    End Sub

  4. #4
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location
    When I ran your modified code and I got 200 urls in A2 to A201, and the one you have queried arrived in cell A2, so I can't see any problem.
    Last edited by offthelip; 03-15-2017 at 03:53 PM. Reason: I hadn't realised you modified the code

  5. #5
    When I run the code, I also got 200 URLs(Specific URL's) in sheet1 (from A2 to A201). But I could not see first URL in the immediate window. that is why i mentioned. Why i could not see first URL in the immediate window.

  6. #6
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location
    I put a break straight after the debug print statement and I get the url coming up fine. If you are getting what you want onm the spreadsheet why worry about what is happening in the immediate window? Seeing what is happening in the immedate window can be difficult I find. I tend to use breakpoints and watch window to debug things.

  7. #7
    Thank you so much for your inputs. Just now, I understood one thing that, Immediate window can show only 199 rows + 1 blank row (When we run the debug statement). Maybe, That is why I could not see the first URL.

  8. #8
    Moderator VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,814
    Location
    Three yo thread closed.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •