Hello,
I have about 100 webpages that I would like to download as text, and then parse out the 21st table from. I have the code set up to access all the websites as web queries but do not know how to download each of them as a text file.
Thanks.
Hello,
I have about 100 webpages that I would like to download as text, and then parse out the 21st table from. I have the code set up to access all the websites as web queries but do not know how to download each of them as a text file.
Thanks.
Do you want the text of the webpages or the HTML?
Cordially,
Aaron
Keep Our Board Clean!
- Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
- Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.
The text. I only need certain text and data from the tables.
But it's ok if I get the table or the text. I really just need the 21st table from the 100 pages in ANY form. thx.
Last edited by Oorang; 07-09-2008 at 10:08 AM. Reason: Merged concurrent posts by same user.
I'd be interested to see your web query code, but here is a very Q&D way:
[VBA]Public Sub GetWebText()
'MAKE SURE YOU SET A REFERENCE TO:
'shdocvw.dll
'mshtml.tlb
Dim objIE As SHDocVw.InternetExplorer
Dim ieDoc As MSHTML.HTMLDocument
Dim ws As Excel.Worksheet
Dim strURL As String
Dim lngRow As Long
Dim strText As String
Set ws = ActiveSheet
ws.Cells.WrapText = False
'Create Internet Explorer Object
Set objIE = New SHDocVw.InternetExplorer
Do
lngRow = lngRow + 1
strURL = ws.Cells(lngRow, 1).Value
If Not CBool(LenB(strURL)) Then
Exit Do
End If
'Navigate the URL
objIE.Navigate strURL
'Wait for page to load
Do Until objIE.ReadyState = READYSTATE_COMPLETE: Loop
'Get document object
Set ieDoc = Nothing
Do While ieDoc Is Nothing
Set ieDoc = objIE.Document
Loop
strText = vbNullString
On Error Resume Next
Do
Err.Clear
strText = ieDoc.body.innerText
Loop While Err.Number
On Error GoTo 0
ws.Cells(lngRow, 2).Value = strText
Loop
objIE.Quit
End Sub
[/VBA]
Cordially,
Aaron
Keep Our Board Clean!
- Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
- Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.
Thanks I currently can't set reference to shdocvw.dll but will try the code from above when I can and get back to you on it. I have another question,
I am in a website where I input a deal number, and the address in the resulting page's address looks like this
where the 03761fab5 after searchquery= is the deal number. Once I get to the page after the query, I click on a different link and the resulting address in the address bar isHTML Code:http://zizizizizi.com/zizizizizi/cust/qcksearch/qcksearch_search_result.asp?searchident=qcksearch&startkey=0&search=2&searchquery=03l61fak5&redir_url=/zizizizizi/cust/qcksearch/qcksearch%5Fsearch%5Fresult.asp&bhcp=1"]http://zizizizizi.com/zizizizizi/cust/qcksearch/qcksearch_search_result.asp?searchident=qcksearch&startkey=0&search=2&searchquery=[B]03l61fak5[/B]&redir_url=/zizizizizi/cust/qcksearch/qcksearch%5Fsearch%5Fresult.asp&bhcp=1
.....HTML Code:http://zizizizizi.com/zizizizizi/cust/qcksearch/qckSearch_search_result.asp?n_id=400038356&searchQuery=03761fab5&search=2"]http://zizizizizi.com/zizizizizi/cust/qcksearch/qckSearch_search_result.asp?n_id=[B]400038356[/B]&searchQuery=03761fab5&search=2&
The 400038356 is the site's deal id which I need in order to quickly query many tables. Is there a way to extract the address in the address bar as a query so I can parse out the 9 digits in ...search_result.asp?n_id=400038356&searchQuery...?
Why can't you use shdocvw? Are you running non-windows?
Please post your web query code, and I'll see if I can hit on an alternative.
Cordially,
Aaron
Keep Our Board Clean!
- Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
- Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.