PDA

View Full Version : Regex expression to retrieve data from HTML



Cael
06-17-2014, 07:41 PM
Hi, I am trying to extract an integer from a website's source code...

This is a snippet of the HTML coding of the website.


<div style="width:100%;"><div class=CatDiv ></span><br><span class=CatLevel1><a onclick="Javascript:ShowMeu('21');">Arts, Antiques & Collectibles</a>&nbsp;(4556)
</span><BR><span id=Cat21 style="display:none;"><span class=CatLevel2>
<a href="lelong.com.my/arts-antiques-and-collectibles/">View All</a>&nbsp;(10946)</span><br><span class=CatLevel2>

How can I get the integer value "10946" using maybe regex expression and don't retrieve the integer "4556"? I notice all the integer that I needed to retrieve has a "View All" in front and the class of "CatLevel2"(there are others element using the same class as this that I don't want to retrieve) so maybe I can use that as the pattern to retrieve the integer after it. Can anyone show me how? Thx.

Cael
06-17-2014, 09:47 PM
I've tried to do it but its not really working...Can anyone look into this? This will capture the number after "View All" and put it into a cell in Excel spreadsheet.

VB


Sub MudahNumber() Dim RegexURL As RegExp Dim RegexMatch As MatchCollection Dim strURL As String Set RegexURL = New RegExp Set IE = New InternetExplorer With RegexURL .MultiLine = True .IgnoreCase = True .Global = False .Pattern = "View All[^(]*?\(([0-9]*)" End With With IE .Visible = True .navigate "lelong.com.my/Auc/List/BrowseAll.asp" End With Do Until IE.readyState = READYSTATE_COMPLETE And IE.Busy = False Loop strPageContent = IE.document.body.innerText If RegexURL.Test(strPageContent) Then Set RegexMatch = RegexURL.Execute(strPageContent) Range("K4").Value = RegexMatch(0).SubMatches(0) End If End Sub


Thank you :)

snb
06-18-2014, 03:04 AM
crossposted at: http://www.ozgrid.com/forum/showthread.php?t=189059&p=716711#post716711

Aussiebear
06-18-2014, 06:36 AM
Cael, If you really feel the need to cross post, please post the link so that those of us who might wish to follow your thread and contribute, are not repeating any possible assistance you may have recieved elsewhere. Serial cross posters tend to get ignored, and trust me you don't want to get that sort of reputation.

snb
06-18-2014, 12:04 PM
answered at: http://www.ozgrid.com/forum/showthread.php?t=189059&p=716760#post716760