PDA

View Full Version : Solved: Help with VBA & html spidering/downloading!



efz
03-03-2010, 02:55 PM
Hello i would like your help with something that i have been searching for lots of time! I think its quite hard but you know better cause i am a complete newbie on vba!

Ok I have a list of nearly 5.000 ISBN that i want to match each line with a specific "text" found in a website. But lets take it from the beggining:

Given i got in A1

the url of the website that have this syntax:

http://bookshop.blackwell.co.uk/jsp/id/ISBN
http://bookshop.blackwell.co.uk/jsp/id/ISBN
http://bookshop.blackwell.co.uk/jsp/id/ISBN
http://bookshop.blackwell.co.uk/jsp/id/ISBN
http://bookshop.blackwell.co.uk/jsp/id/ISBN
http://bookshop.blackwell.co.uk/jsp/id/ISBN
ETC.

etc. and in B1 i want to take this information for each line:

for example for an ISBN: 9780385504225

I GOT IN A1 THE URL (NOTE: URL ISNT STORED AS HYPERLINK)

http://bookshop.blackwell.co.uk/jsp/id/9780385504225

AND IN CELL B1 I WANT THIS TEXT:
US title not currently available

Is there a way to spider this information through VBA and if itsnt possible
could i maybe use vba to download all these html files and then strip them down with another program?
thanks in advance

to show you exactly what i want to do I have attached an img

Jan Karel Pieterse
03-04-2010, 12:47 AM
You might be helped using a parameter web-query:
www.jkp-ads.com/articles/webquery.asp (http://www.jkp-ads.com/articles/webquery.asp)

See attached...

efz
03-13-2010, 04:04 AM
bump! anyone can help a little bit more please?!

ZVI
03-13-2010, 06:41 AM
Hi,
Try this code:


Sub GetISBN()

Const PubStatusTag = "pubstatus="
Dim Rng As Range, x As Range, oHttp As Object, txt$, i&, j&
Set Rng = Range("A2", Cells(Rows.Count, 1).End(xlUp))

On Error Resume Next
Set oHttp = CreateObject("MSXML2.XMLHTTP")
If Err <> 0 Then Set oHttp = CreateObject("MSXML.XMLHTTPRequest")
If oHttp Is Nothing Then MsgBox "MSXML2.XMLHTTP not found", 16, "Error": Exit Sub
On Error GoTo 0

With oHttp
For Each x In Rng
.Open "GET", x.Value, False
.Send
txt = .responseText
i = InStr(1, txt, PubStatusTag, 1)
If i = 0 Then
x.Offset(, 1).Value = "PubStatusTag not found"
Else
i = i + Len(PubStatusTag)
j = InStr(i, txt, "&", 0)
x.Offset(, 1) = Replace(Mid(txt, i, j - i), "+", " ")
End If
Next
End With
Set oHttp = Nothing

End Sub
Regards,
Vladimir

efz
03-14-2010, 04:11 AM
omg thank you so much! this works great

mdmackillop
03-14-2010, 04:14 AM
Hi Vladimir,
Can you create a KB Item for this?
Regards
MD