View Full Version : Not able to extract text from a webpage

11-07-2010, 11:25 AM

I am using the following VBA code to extract text from a webpage (http://www.frendz4m.com/p/forum/showthreads.php?forumID=13&subforumID=0&ID=2066722)

I am interested in the text inside a node called SPAN. As there are some unwanted SPAN nodes, I am trying to differentiate them on the basis of their font type, but I am getting an error at hinput.font = ”verdana” (run time error 438).
What property can I use to differentiate the SPAN nodes?

Set hdoc = IE.document
Set hColl = hdoc.getElementsByTagName("SPAN")
For Each hinput In hColl
If hinput.font = ”verdana” then
MsgBox hinput.innertext

HTML source code
<span style='font:normal 12px verdana;color:#000000;'>Dad:Ess Baar exam me paas ho<br>ya fail BIKE zarur dilaunga.<br>Son: _
Kaunsi bike?<br>Dad: Pass he to “APACHE” college<br>jane ke liye.<br>Fail hue to “RAJDOOT” dood<br>bechne ke liye...</span>



11-07-2010, 03:21 PM
Have a look at this thread (http://www.vbaexpress.com/forum/showthread.php?t=34781). I don't think you can use the font property as such, but search for it in the page code and use the result to extract the data.

11-07-2010, 06:42 PM

Thanks. The approach in the webpage you suggested is a bit different and slow.
What I am trying to is parse the DOM tree and directly go to a specific node and get its innertext, which is much more precise and quicker (I think).
As you can see from the html code I posted, the SPAN node that I want has three properties - style, font, and color. I was wondering if there is a way to use those properties to seperate them from the remaining SPAN nodes, with different style, font and color properties.

Shred Dude
11-14-2010, 10:59 PM
Is your variable hInput dim'd as an HTMLSpanElement?

If not, try explictly setting it to that type.

Then try something like:

For Each hinput In hColl
If instr(1, hInput.style.font,"Veranda") then
'IF instr(1, hInput.style.fontstyle,"Veranda") then
MsgBox hinput.innertext
End if

You were also missing and End If...