Consulting

Results 1 to 4 of 4

Thread: Not able to extract text from a webpage

  1. #1

    Not able to extract text from a webpage

    Hi,

    I am using the following VBA code to extract text from a webpage

    I am interested in the text inside a node called SPAN. As there are some unwanted SPAN nodes, I am trying to differentiate them on the basis of their font type, but I am getting an error at hinput.font = ”verdana” (run time error 438).
    What property can I use to differentiate the SPAN nodes?

    [vba]
    Set hdoc = IE.document
    Set hColl = hdoc.getElementsByTagName("SPAN")
    For Each hinput In hColl
    If hinput.font = ”verdana” then
    MsgBox hinput.innertext
    Next [/vba]

    HTML source code
    <span style='font:normal 12px verdana;color:#000000;'>Dad:Ess Baar exam me paas ho<br>ya fail BIKE zarur dilaunga.<br>Son: _
    Kaunsi bike?<br>Dad: Pass he to “APACHE” college<br>jane ke liye.<br>Fail hue to “RAJDOOT” dood<br>bechne ke liye...</span>

    Thanks,

    MG.
    Last edited by Aussiebear; 11-07-2010 at 10:03 PM. Reason: Add VBA Tags & Fit to Page

  2. #2
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Have a look at this thread. I don't think you can use the font property as such, but search for it in the page code and use the result to extract the data.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  3. #3
    mdmackillop,

    Thanks. The approach in the webpage you suggested is a bit different and slow.
    What I am trying to is parse the DOM tree and directly go to a specific node and get its innertext, which is much more precise and quicker (I think).
    As you can see from the html code I posted, the SPAN node that I want has three properties - style, font, and color. I was wondering if there is a way to use those properties to seperate them from the remaining SPAN nodes, with different style, font and color properties.

  4. #4
    Is your variable hInput dim'd as an HTMLSpanElement?

    If not, try explictly setting it to that type.


    Then try something like:

    For Each hinput In hColl 
        If instr(1, hInput.style.font,"Veranda") then
    'OR
         'IF instr(1, hInput.style.fontstyle,"Veranda") then
        MsgBox hinput.innertext 
        End if
    Next
    You were also missing and End If...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •