PDA

View Full Version : Copy data from website source code to spreadsheet



vrCommander
10-18-2019, 04:04 AM
Hi everybody

I'm trying to find a way to copy data from a websites source code and paste it into excel.
The data looks like this:



<p>



<span class="highlight">Mitarbeiter:</span> 120 in Deutschland<br/>



<span style="display: "><span class="highlight">Umsatzklasse:</span> 10 - 50 Mio. Euro<br/></span>



<span style="display: none"><span class="highlight">Filialen:</span> <br/></span>



<span style="display: inline"><span class="highlight">Gegründet:</span> 1925</span>



</p




The value I'm looking for here would be "10 - 50 Mio. Euro".

This is the code I've written so far:



Sub Sample() Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.Navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
Do While .Busy And .readyState <> 4: DoEvents: Loop
Application.Wait Now + TimeValue("0:00:06")
Sheets("Dummy").Range("A1").Value = .document.body.outerHTML
.Quit
End With
End Sub

My approach here was to first copy the entire HTML source code to a spreadsheet and then continue with the InStr function.
However the solution above just gives me an excerpt from the full source code, which does not contain the data I'm looking for.

Does anyone know what exactly I'm doing wrong here?
Is there maybe a better approach, maybe to get the data I'm looking for directly from the website without the step inbetween?

Thank you guys in advance for any interest in my problem:hi::hi:

Leith Ross
10-18-2019, 11:05 AM
Hello vrCommander,

Welcome!

Are you wanting to extract only certain text from the web page and copy that text to a worksheet? I ask because your title suggests you want to copy some HTML code to the clipboard and the paste the rendered HTML to the worksheet.

aladdin
07-09-2020, 04:13 AM
Hi everyone, I didn't want to create a post with the same and common questions. I think "get data from web source" is quite a big area for questions on this forum :)

My question is how to get a text from an HTML element? I found several solutions on other sites, but none of them worked for me :( VBA shows different errors: 13, 4xx, etc. - depending on my attempts to change some elements.

I found the solution that might work vrCommander, as it's for scraping table data from a website.



Sub WikiTable()
Dim ieObj As InternetExplorer
Dim htmlEle As IHTMLElement
Dim i As Integer


i = 1


Set ieObj = New InternetExplorer
ieObj.Visible = True

'please change "wikipedia domain URL org" in next line to the wiki domain url, because this forum thinks I pasted too many URLs in my post
ieObj.navigate "wikipedia domain URL org/wiki/List_of_countries_and_dependencies_by_population"


Application.Wait Now + TimeValue("00:00:10")


For Each htmlEle In ieObj.document.getElementsByClassName("wikitable")(0).getElementsByTagName("tr")
With ActiveSheet
.Range("A" & i).Value = htmlEle.Children(0).textContent
.Range("B" & i).Value = htmlEle.Children(1).textContent
.Range("C" & i).Value = htmlEle.Children(2).textContent
.Range("D" & i).Value = htmlEle.Children(3).textContent
.Range("E" & i).Value = htmlEle.Children(4).textContent
End With


i = i + 1


Next htmlEle


End Sub




The next solution I found for a non-table data, but it doesn't work too :(



Public Sub LinkedinName()
Dim IE As Object: Set IE = CreateObject("InternetExplorer.Application")
Dim Elements As Object
Dim Element As Object


With IE
'Show and navigate to page
.Visible = True
'please change "LINKEDIN domain URL com" in next line to the wiki domain url, because this forum thinks I pasted too many URLs in my post
.navigate ("LINKEDIN domain URL com/in/jeffwilsonbizzdesign/")


'Wait until page loads
Do While .Busy And .readyState <> 10
DoEvents
Loop


'Create a collection of Option Tags, which are part of pa_packages
Set Elements = .document.getElementById("top-card")(0).getElementsByTagName("h1")


'Show the element's properties
For Each Element In Elements
Debug.Print "The InnerText is: " & Element.innerText
Debug.Print "The Value is: " & Element.Value
Next
End With
End Sub


I played and changed elements of the code, was trying to solve issues, but stuck :banghead: I want to add a name "Jeff Wilson" from the
<h1 class="top-card-layout__title">Jeff Wilson</h1> line. And then add "Web Developer and Social Media Implementation Engineer at BizzDesign" from
<h2 class="top-card-layout__headline">Web Developer and Social Media Implementation Engineer at BizzDesign</h2> and then add "Brisbane, Australia" from the
<span class="top-card__subline-item">Brisbane, Australia</span> line. All these texts should be pasted in one cell.

Can you please help me figure it out?