PDA

View Full Version : Excel VBA code reads wrong innerHTML code



hunsnowboard
08-17-2015, 02:05 AM
0down votefavorite (http://stackoverflow.com/questions/32009799/excel-vba-code-reads-wrong-innerhtml-code#)


Hi Everyone!
I am absolutely new in webscraping and have some minor previous VBA knowledge. I am trying to make a scraper which enters a site makes a search and then scrapes the details of the search. I am very annoyed that my scraper can make the search with the given parameters, but after the search is made and the website is loaded, I make a innerHTML read request within VBA and the results are NOT the source code of the new page. So I cannot extract any information because my VBA code does not see the actual webpage html data. Why is that happening? What is the source code that my VBA extracts?
Thank you very much for your help in advance!



Public Sub my_scraper()

Dim my_data1, my_data2 As String
Dim my_Coll As String

i = 1



my_data1 = ActiveSheet.Cells(1, 1).Value
my_data2 = ActiveSheet.Cells(1, 2).Value

my_Coll = profession_hu_scraper(my_data1, my_data2)



Cells(2, 2).Value = my_Coll



End Sub


Public Function profession_hu_scraper(ByVal my_data1 As String, ByVal my_data2 As String) As String


Dim objIE As InternetExplorer
Dim html As HTMLDocument
Dim Link As Object
Dim ElementCol As Object
Dim erow As Long
'Dim all_inp_el As Object


'Application.ScreenUpdating = False

Set objIE = CreateObject("InternetExplorer.Application")

With objIE
.Visible = True
.Navigate "https://www.profession.hu/"

Do While .ReadyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading website..."
DoEvents
Loop

Set html = .Document
Range("A16") = html.DocumentElement.innerHTML




.Document.getElementById("header_keyword").Value = my_data1
.Document.getElementById("header_location").Value = my_data2

Set my_classes = .Document.getElementsByClassName("p2_button_inner")

For Each my_class In my_classes
If my_class.getAttribute("value") = "Keresés" Then
Range("c4") = "Clicked"
my_class.Click
i = i + 1
End If
Next my_class

Do While .ReadyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading website..."
DoEvents
Loop

Set html = .Document
Range("B16") = html.DocumentElement.innerHTML

End With
Set objIE = Nothing
Application.StatusBar = "Finished"

'Application.StatusBar = ""
End Function



So basically my problem is the following:
1. The scraper goes to profession.hu. Loads the innerHtml code and
displays it in the cell A16. (I checked the result and this is
working properly, so no problem here).
2. Then writes data in two input fields and makes the search. (Obviously after the search a new webpage is displayed with the
search findings).
3. After the new page is fully loaded the scraper takes again the innerHtml of the new page and displays it in cell b16. (Here is my
problem: the innerHTML source code it is not correct, because I
checked it and it is not what it should be).

Thanks again for your help in advance!

p45cal
08-18-2015, 02:17 AM
cross post: http://stackoverflow.com/questions/32009799/excel-vba-code-reads-wrong-innerhtml-code