PDA

View Full Version : Automating IE to get data from a website



safetybelt
01-24-2012, 04:59 PM
Hey guys, so I've found a few threads similar to my problem and used what I saw to try to overcome my issue, but it has been to no avail.

Here is the relevant code:

Dim ie As Object, tempStr As String, gamerTag As String, vReadyState As Integer

Set ie = CreateObject("InternetExplorer.Application")

gamerTag = Sheets("Query").Range("B1").Value

ie.Visible = 1
ie.Navigate "-removed-" & gamerTag

vReadyState = 0
Do Until vReadyState = 4
DoEvents
vReadyState = ie.ReadyState
Loop


tempStr = ie.document.body.all.Item("profileFrame").document.body.all.Item("slots_container").innertext

Sheets("Query").Range("D1").Value = tempStr
Set ie = Nothing


The problem I'm having is that it is correctly finding the profileFrame (which is an iframe) that I am looking for, but then it is not finding the correct HTML within that iframe. When I view the source of the fully loaded page using the Chrome Element Viewer, I'm seeing everything as I would expect. IE, however, is not able to access the iframe information for whatever reason.

I've tried using ie.document.body.all.Item("profileFrame").document.body.innertext and .innerhtml, but they are pulling up the wrong HTML (hence why it's unable to find the "slots_container" item).

The HTML is throwing me through a loop, it appears to be opening itself within the iframe, but that doesn't seem to make any sense to me :\ I think what is happening is some behind the scenes stuff (which I know next to nothing about) that is causing the site to load a different page within the iframe, even though it appears to be loading itself. Like I mentioned, the Google Chrome Element Viewer has the source exactly how I would expect it to be... but "ie.document.body.innerhtml" is giving me something completely different than what Chrome is giving me (and also different than the source manually viewed in IE, but that is to be expected with all the divs and iframes).

If anyone has any ideas, please let me know; I've been pulling my hair out for awhile over this...

This is on Windows 7 using Excel 2003 if that matters. I welcome anyone who can offer any suggestions, thank you!

I cannot post this without removing the link from the relevant VBA as well as a link to the source site. I have removed them both... Hopefully someone will know what I'm talking about without having the actual link!

Crocus Crow
01-24-2012, 05:28 PM
Without knowing the URL I can't really help. Try posting it by disguising it with spaces or underscores and describe how you've disguised it so I know what the correct URL is.

For easier parsing and understanding of HTML in VBA, declare variables as data types from the Microsoft HTML Object Library (set a reference in Tools - References in the VB editor), rather than using the Object type. As a starter, HTMLDocument is the IE.document object.

Also, break down code such as this into several stages rather than doing everything in one line.


tempStr = ie.document.body.all.Item("profileFrame").document.body.all.Item("slots_container").innertext

safetybelt
01-24-2012, 05:32 PM
I thought about disguising it, but I did not want to get in trouble for doing so :x

Here it is, you should be able to copy/paste it normally:
us.playstation.com/publictrophy/index.htm?onlinename=ace_anubis

Edit:
Thanks for the advice about the data types instead of a generic object, I'll try that out when I get a chance (unfortunately, I'm working on some other stuff so I can't mess around with this project too much). I'll mess around with breaking down the code a bit as well.

Crocus Crow
01-25-2012, 08:37 AM
To get the HTML inside the iframe, extract the src attribute from the <iframe > tag and navigate to the URL formed by that src and the base URL (us.playstation.com).

For the code to compile, you need to set a reference to the MS Internet Controls and HTML Object Library, as noted in the comment in the code. Do this via Tools - References in the VBA project.


Public Sub IE_Automation()

'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library

Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement

baseURL = "http://us.playstation.com"

Set IE = New InternetExplorer
With IE
.Visible = True

'Navigate to the main page

.Navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend

'Get the profileFrame iframe and navigate to it

Set profileFrame = .Document.getElementById("profileFrame")
.Navigate baseURL & profileFrame.src
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend

Set HTMLdoc = .Document
End With

'Display all the text in the profileFrame iframe

MsgBox HTMLdoc.body.innerText

'Display just the text in the slots_container div

Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText

End SubThe code just displays the text within the Iframe and Div. I'll leave it to you to properly parse the HTML in the way you want.

safetybelt
01-25-2012, 10:15 AM
This is totally perfect, thank you so much! I was going nuts with that one, you are a life saver <3

Edit: How can I flag this one as solved, I don't see it in the Thread Tools? :x

VABBronx
02-06-2013, 01:46 PM
How would you handle this if the source code had a redirect on it that would not let you get to the iframe? Is there another workaround?

if (window == window.top) {
// Redirect to parent page if opened up by itself
top.location.replace("../ExpressSearch.aspx" + location.search);
}