PDA

View Full Version : Extract specific data from a website



Cael
06-16-2014, 09:27 AM
Hi, I am trying to extract an integer from a website's source code...

This is a snippet of the HTML coding of the website.



<div id="anchor1" class="g-category-box clearfix"><h4 class="ui-box-title">A-C</h4> <dl class="listLeft">
<dt><a href="somewebsite">Agriculture</a><em>(1839653)</em></dt>
<dd>
<a href="someWebsite">Plant Extract</a> |


How can I get the integer value "1839653" using maybe VBA and put it into any/specified cell in my excel sheet. FYI, the website is: alibaba.com/Products

Your help is greatly appreciated. Thanks :)

ashleyuk1984
06-16-2014, 10:28 AM
You'll probably need the following References enabled...

Microsoft HTML Object Library
Microsoft Internet Controls


Sub ArticleNumber()
Dim IE As Object
Dim ieDoc As HTMLDocument

Set IE = CreateObject("InternetExplorer.Application")

IE.Visible = True
IE.Navigate "http://www.alibaba.com/Products"

Do While IE.ReadyState <> 4 Or IE.Busy = True
DoEvents
Loop

Set ieDoc = IE.document

Range("A1").Value = ieDoc.getElementsByTagName("em")(0).innerText

End Sub

Cael
06-16-2014, 06:02 PM
Thanks for the reply!

I tried the code and it work but how can I eliminate the negative sign that appears in front on the integer?
(Example: the data retrieved is -1839653)

Tq:)

ashleyuk1984
06-17-2014, 05:25 AM
By removing the brackets, this removes the negative symbol.


Sub ArticleNumber()
Dim IE As Object
Dim ieDoc As HTMLDocument

Set IE = CreateObject("InternetExplorer.Application")

IE.Visible = True
IE.Navigate "http://www.alibaba.com/Products"

Do While IE.ReadyState <> 4 Or IE.Busy = True
DoEvents
Loop

Set ieDoc = IE.document

ExtractNumber = ieDoc.getElementsByTagName("em")(0).innerText
ExtractNumber = Replace(ExtractNumber, "(", "")
ExtractNumber = Replace(ExtractNumber, ")", "")

Range("A1").Value = ExtractNumber

End Sub

Cael
06-17-2014, 05:53 AM
Thanks :)

What is I have an HTML code that don't have a clear tag name?

Example:

<div style="width:100%;"><div class=CatDiv ></span>
<br>
<span class=CatLevel1><a onclick="Javascript:ShowMeu('21');">Arts, Antiques & Collectibles</a>&nbsp;(10953)</span>
<BR>
<span id=Cat21 style="display:none;">

How can I get the integer value for this "10953"? Thx.