PDA

View Full Version : Scraping product prices from E-commerce using Excel VBA / cooking handling



Mojies
11-22-2018, 10:56 AM
Hi all,

I've written a code that actually works but is really slow and can be improved to make it less error sensitive and faster.

It's a code that uses IE automation in Excel VBA and extracts store prices from a supermarket website. The thing is that across the different stores (in different cities) the prices are different for the same products. To extract the different 'price lines' for the products the codes does/ must do the following:

1- visit the website of the supermarket and select the first store / city. The selected store is saved in a browser cookie.
2- the next code part visits the specified product (in total, finally it will be a fixed list around 200 products, using a loop). (The prices are extracted in a worksheet, not includes in this code yet)
3- next a second city/supermarket is selected and stored in the browser cookie
4- the same code (#2, see above) is called and the prices for the same 200 products is scraped from the store, and saved.


When using IE automation the specific store/city is remembered when visiting the product pages. I found that when using a different method like XMLHTTP request the code is much faster but the problem then is that the product page is not linked to the store that was saved in a browser cookie. Any suggestions how to use the XMLHHTP code (provided at the bottom) or a different method together with a cookie in which the correct store is saved while scraping the product prices?


Below the code, it's now using only one store and only one sample product to illustrate.

Selecting the Store using IE automation and saving it in browser cookie (using IE automation)

This code will set the value of the browser cookie named/stored at "www.jumbo.com > cookies > HomeStore" to the value "YC8KYx4XB88AAAFIDcIYwKxJ"
If there is a cleaner/faster way to only set/change this value, it would would help already for this part:)



Sub SetStore()

' this code will set the value of the cookie named "www.jumbo.com > cookies > HomeStore" to the value "YC8KYx4XB88AAAFIDcIYwKxJ"
' when using sample store 2 (Brielle) it should be set to "OG8KYx4XP4wAAAFIlsEYwKxK"
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLSearchbox As MSHTML.IHTMLElement
Dim HTMLSearchboxes As MSHTML.IHTMLElementCollection
Dim HTMLButton As MSHTML.IHTMLElement
Dim HTMLButtons As MSHTML.IHTMLElementCollection
Dim HTMLSearchButton As MSHTML.IHTMLElement
Dim HTMLSearchButtons As MSHTML.IHTMLElementCollection
Dim HTMLStoreID As MSHTML.IHTMLElement
Dim HTMLStoreIDs As MSHTML.IHTMLElementCollection
Dim HTMLSaveStore As MSHTML.IHTMLElement
Dim HTMLSaveStores As MSHTML.IHTMLElementCollection

'set on False to hide IE screen
IE.Visible = True

'navigate to url with limited content
IE.navigate "https://www.jumbo.com/content/algemene-voorwaarden/"

Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.document
Set HTMLButtons = HTMLDoc.getElementsByTagName("button")

For Each HTMLButton In HTMLButtons

If HTMLButton.getAttribute("data-jum-action") = "openHomeStoreFinder" Then
HTMLButton.Click
Exit For
End If
Next HTMLButton

Application.Wait Now + #12:00:02 AM#

Set HTMLSearchboxes = HTMLDoc.getElementsByTagName("input")
For Each HTMLSearchbox In HTMLSearchboxes

If HTMLSearchbox.getAttribute("id") = "searchTerm__DkKYx4XylsAAAFJktpb2Guy" Then

'input field store name/location to show search results
' Sample 1 "Oosterhout Zuiderhout"
' Sample 2 "Brielle"
' change value below to search correct store (also change corresponding value later in this code)
HTMLSearchbox.Value = "Oosterhout Zuiderhout"

Application.Wait Now + #12:00:03 AM#
HTMLSearchbox.Click
Exit For
End If

Next HTMLSearchbox
Set HTMLSearchButtons = HTMLDoc.getElementsByTagName("button")

For Each HTMLSearchButton In HTMLSearchButtons

If HTMLSearchButton.getAttribute("data-jum-filter") = "search" Then
HTMLSearchButton.Click
Exit For
End If

Next HTMLSearchButton

Application.Wait Now + #12:00:05 AM#

Set HTMLStoreIDs = HTMLDoc.getElementsByTagName("li")

For Each HTMLStoreID In HTMLStoreIDs
' Sample 1 oosterhout = YC8KYx4XB88AAAFIDcIYwKxJ
' Sample 2 brielle = OG8KYx4XP4wAAAFIlsEYwKxK
'change value below to switch store
If HTMLStoreID.getAttribute("data-jum-store-id") = "YC8KYx4XB88AAAFIDcIYwKxJ" Then

HTMLStoreID.Click
Application.Wait Now + #12:00:03 AM#
Exit For
End If

Next HTMLStoreID

Set HTMLSaveStores = HTMLDoc.getElementsByTagName("button")

For Each HTMLSaveStore In HTMLSaveStores

If HTMLSaveStore.getAttribute("data-jum-action") = "saveHomeStore" Then
HTMLSaveStore.Click
Exit For
End If


Next HTMLSaveStore

'IE.Quit

End Sub



Extracting the product price (using IE automation)

The following part will extract the price (via debug.print in Immediate window) for one sample product (Bastogne Koeken).
When using the first store (Oosterhout Zuiderhout) the price is 1,35 for this product.
When using the second store (Brielle) the price is 1,44 (prices might change over time, but should be different between the two stores).
When no store is selected (no HomeStore cookie value is set) the price is 1,44


Sub GetJumboPriceIE()


Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String


Dim SKU_tag As String, SKU_url As String


SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"


IE.Visible = False
IE.navigate SKU_url


Do While IE.readyState <> READYSTATE_COMPLETE
Loop


Set HTMLDoc = IE.document


IE.Quit

Set JumInputs = HTMLDoc.getElementsByTagName("input")


Price_In_Cents_Tag = "PriceInCents_" & SKU_tag


Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)


JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice


End Sub



As said, the code is working but is slow due to the IE automation it's taking very long when using it for 200 products and 5 different stores.

Below a code for the price extraction part that is much faster (using XML HTTP method) but doesn't work because i need to set the correct store in the cookie first and i cannot figure it how i do this. Any help is welcome!


Extracting the price using XML HTTP method
The following code uses XML HTTP request to get the price. No store is selected or used from a cookie, so 1,44 is the extracted price. I cannot get the 1,35 from the Oosterhout Zuiderhout store when using the XML HTTP method. >> any suggestions?


Sub GetJumboPriceXML()


Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument


Dim JumInputs As MSHTML.IHTMLElementCollection
Dim JumInput As MSHTML.IHTMLElement
Dim JumPrice As MSHTML.IHTMLElement
Dim JumboPrice As Double
Dim Price_In_Cents_Tag As String


Dim SKU_tag As String, SKU_url As String


SKU_tag = "173140KST"
SKU_url = "https://www.jumbo.com/lu-bastogne-koeken-original-260g/173140KST/"


XMLReq.Open "GET", SKU_url, False
XMLReq.send


If XMLReq.Status <> 200 Then


MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If


HTMLDoc.body.innerHTML = XMLReq.responseText


Set JumInputs = HTMLDoc.getElementsByTagName("input")

Price_In_Cents_Tag = "PriceInCents_" & SKU_tag

Set JumPrice = HTMLDoc.getElementById(Price_In_Cents_Tag)

JumboPrice = JumPrice.getAttribute("value") / 100
Debug.Print JumboPrice


End Sub



Thanks so much for any help, please let me know if i can clarify any further.