PDA

View Full Version : SELECT CASE web data to spreadsheet



jhize
12-01-2006, 08:12 AM
Hi all,

First off, I am in serious need of a solution and I posted on several forums to find that solution. I will post those links in a followup. They can be disregarded.
What follows is a thorough explanation of what I am looking for.

I have included an example workbook to illustrate the way I need this web data formatted after “import”.

I have eliminated the possibility of using a querytables.add as it opens a new instance of IE. I have eliminated the use of using an innertext statement as in CASE 3, it combines the data into a single string and cannot be separated via parsing (due to employee names, titles, and addresses differing in length (Example: with innertext I would get Smith, John RJSMIT1+1…R is his middle initial, JSMITH is his ID) . Get External Data is not an option as I am attempting to do this through VBA.

Using InnerHTML statement returns exactly what I need, but all lumped into a single cell. I need the HTML tags removed and each element in its own cell.

Here is the code in its infancy stage:


Private Sub CommandButton1_Click()

Application.ScreenUpdating = False

Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim I As Long
Dim nm As String
Dim searchperson As Variant
searchperson = Sheets("sheet2").Range("c10").Value
Set IeApp = New InternetExplorer
IeApp.Visible = True 'will be false

sURL = "********" 'url taken out for security purposes
IeApp.navigate sURL

Do
Loop Until IeApp.readyState = READYSTATE_COMPLETE

Set IeDoc = IeApp.document
IeDoc.all.f_simple.Value = (Sheets("sheet2").Range("c10").Value)
IeDoc.all.submit.Click

' Now auto navigating to search return page -------------/////
Application.Wait (Now + TimeValue("0:00:02"))
Do
Loop Until IeApp.readyState = READYSTATE_COMPLETE
' search results ------------------------////


' This works but lumps everything into "A1" with html tags
Set mytable = IeApp.document.getElementById("maindetail")
Sheets("sheet1").Range("a1").Value = IeApp.document.getElementById("maindetail").innerHTML

'Sheets("sheet1").Range("a1").Value = IeApp.document.getElementById("list").innerText

' This section is designed to breakapart the innerhtml and place each element into its own cell
' I have not been able to get it to work
Dim d As Object
Set d = IeApp.document
Dim e As Object ' the elements of the document
Dim t As Object ' the table required
Dim r As Object ' the rows of the table
Dim c As Object ' the cells of the rows.
'Dim I As Long - duplicate declaration from above
Dim J As Long
Sheets("sheet1").Activate
For Each e In d.all
If e.nodeName = "maindetail" Then
J = J + 1
End If
If J = n Then
Set t = e

tabno = tabno + 1
nextrow = nextrow + 1
Set rng = Range("A" & nextrow)

' This where the Code goes into break mode
For Each r In t.Rows
For Each c In r.Cells
rng.Value = c.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next c
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next r

Exit For
End If

Next e
End Sub



Additionally, the relevant source code from the target website is provided. I am not sure what value can be captured to determine CASE SELECT.

Case 1
No Matching Entries
Will return a msgbox saying so
<!-- Content Goes Below Here -->
<P class="errormessage">No Results Found<P>

Case 2
Single employee return.


<!—Static Content Above Here>
<!-- Content Goes Below Here -->
<TABLE>
<TR>
<TD>
<TABLE><TR><TD>
<TABLE id="maindetail">
<TR><TH COLSPAN="2">Last, First Ini</TH></TR>
<TR><TD>
<FORM name="detailrecord">
<TABLE>
<TR><TD>Security Officer</TD></TR>
<TR><TD>Security Company</TD></TR>
<TR><TD> Company Address </TD></TR>
<TR><TD> Company City, State, zip </TD></TR>
<TR><TD> United States</TD></TR>
</TABLE>
</FORM>
</TD><TD>
<TABLE> <TR><TD>Office:</TD><TD>+1 123 456 7890</TD></TR>
<TR><TD>Cell:</TD><TD></TD></TR>
<TR><TD>Pager:</TD><TD></TD></TR>
<TR><TD>Fax:</TD><TD></TD></TR>
</TABLE>
</TD></TR>
</TABLE>
AND
<TABLE id="misc"> <TR>
<TD>
<TABLE>
<TR><TD>CUID:</TD><TD>FirstInitialLastname</TD></TR>
<TR><TD>Employee Type:</TD><TD>Contractor</TD></TR>
<TR><TD>Employee Number:</TD><TD>503595c</TD></TR>
<TR><TD>M-Net ID:</TD><TD>503595</TD></TR>
<TR><TD>RC:</TD><TD>I04920000</TD></TR>
<TR><TD><A HREF="/emd/faq.html#7" TARGET="_blank">Role</A>:</TD><TD>QC-Service & Support</TD></TR>
</TABLE>
AND
<FORM name="mgmt">
<TABLE class="mgmt">
<TR><TH colspan=3>MY COMPANY Management Chain</TH></TR>
<TR><TD><IMG SRC="/emd/images/bul_YELLOW.gif" WIDTH="12" HEIGHT="12" TITLE="YELLOW"> <A HREF="/emd/search.html?f_mnetid=419285">CEO Name</A> </TD><TD class="mail_checkbox"><INPUT type="checkbox" name="mgmt" value="CEO.guy@myCompany.com"</TD<TD class="mail_checkbox">
This continues with several more managers – all I need is the last one (Direct Supervisor)
<INPUT type="checkbox" name="mgmt" value="Direct.Supervisor@MyCompany.com"</TD><TD class="side_phone">+1 123 456 7890</TD></TR>
</TABLE>
</FORM>

Case 3
In the event that several employees have the same name, the search returns page returns those multiple listings. The idea here is to return a msgbox with the listing so that the user can select one of them. The CUID is unique and will be used to return a CASE 2 search result.


<!—Static Content Above Here>
<!-- Content Goes Below Here -->
<TABLE><TR>
<TD>Search found 4 entries.</TD> ?-------Need to capture the number
<TD>&nbsp;</TD>
<TD><DIV id="dwim">
<form name="simplesearch" method="POST" action="/emd/search.html">
<input type="text" name="f_simple" value="" size="20" maxlength="40" accesskey="S">
<input type="SUBMIT" value="Simple Search" name="Submit">
<SPAN CLASS="smalltext"><A HREF="/emd/index.html?searchtype=advanced" accesskey="Z">Advanced Search</A></SPAN>
</form>
</DIV>
</TD>
</TR></TABLE>
<TABLE id="list">
<TR>
<TH>Name</TH>
<TH>CUID</TH>
<TH>Work Phone</TH>
<TH>E-mail</TH>
<TH>City, State</TH>
<TH>Employee Type</TH>
</TR>
<TR class="EVEN">
<TD><A HREF="/emd/search.html?f_mnetid=075306">Smith, John C</A></TD>
<TD>JCSMIT1</TD>
<TD>+1 555 1212 1234</TD>
<TD><A HREF="mailto:John.Smith4@mycompany.com">John.Smith4@mycompany.com</A></TD>
<TD>Denver, CO</TD>
<TD>Occupational</TD>
</TR>
<TR class="ODD">
<TD><A HREF="/emd/search.html?f_mnetid=413149">Smith, John G</A></TD>
<TD>JGSMIT2</TD>
<TD>+1 555 1212 1235</TD>
<TD><A HREF="mailto:John.Smith@ mycompany.com ">John.Smith@ mycompany.com </A></TD>
<TD>Denver, CO</TD>
<TD>Management</TD>
</TR>
<TR class="EVEN">
<TD><A HREF="/emd/search.html?f_mnetid=456829">Smith, John P</A></TD>
<TD>JXSMIT7</TD>
<TD>+1 555 1212 1236</TD>
<TD><A HREF="mailto:John.Smith5@ mycompany.com ">John.Smith5@ mycompany.com </A></TD>
<TD>Provo, UT</TD>
<TD>Occupational</TD>
</TR>
<TR class="ODD">
<TD><A HREF="/emd/search.html?f_mnetid=515786">Smith, John W</A></TD>
<TD>JWSMIT2</TD>
<TD>+1 555 1212 1235</TD>
<TD><A HREF="mailto:John.Smith2@ mycompany.com ">John.Smith2@ mycompany.com </A></TD>
<TD>Glendale, AZ</TD>
<TD>Contractor</TD>
</TR>
</TABLE>


I will post example.xls momentarily.

jhize
12-01-2006, 08:14 AM
The links to disregard are:

http://www.ozgrid.com/forum/showthread.php?t=60994&highlight=breakapart (http://www.ozgrid.com/forum/showthread.php?t=60994&highlight=breakapart)
and
http://www.vbaexpress.com/forum/showthread.php?t=10394&highlight=breakapart (http://www.vbaexpress.com/forum/showthread.php?t=10394&highlight=breakapart)

Norie
12-01-2006, 10:25 AM
You aren't using the code I posted on OzGrid correctly.

You need to tell it which table to return from the document and you don't appear to be doing that anywhere.

Why did you change TABLE to maindetail?

It's the number needed for the table not the name, and that should go in n.

jhize
12-01-2006, 10:27 AM
Thank You! I read that as "insert your table name here". I'll give it a crack.

AND THANK YOU FOR YOUR QUICK RESPONSE!

Norie
12-01-2006, 10:34 AM
The code is actually meant to be a seperate sub which you pass the document and table name to.

Code could probably be written to work with names and since you've already set a reference to the table here.

Set mytable = IeApp.document.getElementById("maindetail")

You could probably just replace t with mytable here.

For Each r In t.Rows
For Each c In r.Cells
rng.Value = c.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next c
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next r
By the way the error will be because the code will be looking at the first element of the documemnt, whatever that maybe.

And that's not likely to be the table.

jhize
12-01-2006, 10:45 AM
Thanks. I'm trying to keep it in same module.

I am no longer getting break mode at that point. But I'm not sure if I'm doing this correctly:


Dim d As Object
Set d = IeApp.document
n = d.getElementById("maindetail")

Norie
12-01-2006, 11:00 AM
Did you not see my suggestion of replacing t with mytable?

PS you still aren't using the code properly.

n is the number (index) of the required table in the document.

jhize
12-01-2006, 11:07 AM
No. I fixed it as you suggested. Im making progress. Last test broke apart the first element but left the others alone.


'Sub GetOneTable(d, n)
Sheets("sheet1").Activate
Dim d As Object
Set d = IeApp.document
n = 4
Dim e As Object ' the elements of the document
Dim t As Object ' the table required
Dim r As Object ' the rows of the table
Dim c As Object ' the cells of the rows.
'Dim I As Long
Dim J As Long

For Each e In d.all
If e.nodeName = "TABLE" Then
J = J + 1
End If
If J = n Then
Set t = e

tabno = tabno + 1
nextrow = nextrow + 1
Set rng = Range("A" & nextrow)
For Each r In mytable.Rows
For Each c In r.Cells
rng.Value = c.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next c
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next r
Exit For
End If

Next e

Norie
12-01-2006, 11:19 AM
The only part of my code you should need now would be the loop.


nextrow = nextrow + 1
Set rng = Range("A" & nextrow)
For Each r In mytable.Rows
For Each c In r.Cells
rng.Value = c.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next c
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next r

jhize
12-01-2006, 11:31 AM
MY LORD!!

Your code is working perfectly. NOW my problem is identifying the table index number (n=). It seems that theres many, MANY more tables on this page then the source code reveals.:think:

jhize
12-01-2006, 11:33 AM
So far, Tables 14,15,16,17,& 18 return the info but do not breakapart.:dunno

jhize
12-01-2006, 11:37 AM
Wow. I dont understand BUT n=19 returned the name im searching with in a1 with all the rest in a2. n=20 retruned all data except for search term and YES they were brokeapart. What do you think is causing that??!

Norie
12-01-2006, 11:45 AM
I don't know.:)

It's almost impossible to help with this sort of thing without the URL in question.

jhize
12-01-2006, 11:51 AM
I know. It's like the blind leading the blind. BUT You have been a great help so far. I don't see anyone else jumping in and you've takled it head on. I'm impressed considering you can't find this info anywhere that I know of.

Does the source code of the webpage help in any way?

jhize
12-01-2006, 12:39 PM
The major reason I have left all of this code in one module is because I could not initially pass the active webpage to the d value.

Now, however, I may need to to call this routine for each piece of data

How would you go about passing the value of d?

By the way mytable was throughing the code off.

Here it is in near working order.



Application.ScreenUpdating = False
Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim I As Long
Dim nm As String
Dim searchperson As Variant
searchperson = Sheets("sheet2").Range("c10").Value
Set IeApp = New InternetExplorer
IeApp.Visible = False 'will be false
sURL = "my company url (http://emd.uswc.uswest.com/emd/index.html)"
IeApp.navigate sURL
Do
Loop Until IeApp.readyState = READYSTATE_COMPLETE

Set IeDoc = IeApp.document
IeDoc.all.f_simple.Value = (Sheets("sheet2").Range("c10").Value)
IeDoc.all.submit.Click

' Now auto navigating to search return page -------------/////
Application.Wait (Now + TimeValue("0:00:02"))
Do
Loop Until IeApp.readyState = READYSTATE_COMPLETE
' search results ------------------------////
Sheets("sheet1").Activate
Dim d As Object
Set d = IeApp.document
n = 20
Dim e As Object ' the elements of the document
Dim t As Object ' the table required
Dim r As Object ' the rows of the table
Dim c As Object ' the cells of the rows.
'Dim I As Long
Dim J As Long

For Each e In d.all
If e.nodeName = "TABLE" Then
J = J + 1
End If
If J = n Then
Set t = e

tabno = tabno + 1
nextrow = nextrow + 1
Set rng = Range("A" & nextrow)
For Each r In t.Rows
For Each c In r.Cells
rng.Value = c.innerText
Set rng = rng.Offset(, 1)
I = I + 1
Next c
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next r
Exit For
End If

Next e
Unload Me
IeApp.Quit: Set IeApp = Nothing


Because of the many tables (values of n), this is not working exactly as planned.

stanl
12-02-2006, 11:24 AM
Because of the many tables (values of n), this is not working exactly as planned.

Have you given any thought to using either (1) HTTP Transfer and XML DOM (2) Jet 4.0 with "HTML IMport" ISAM - your tables seem seem well-formed. .02 Stan

jhize
12-02-2006, 01:29 PM
I would be interesting in learning more. I will start searching for this. If you can assist I would greatly appreciate it.

jhize
12-02-2006, 02:05 PM
Alright. I have settled for now on what I've got for CASE 2. The code above imports the identified table and puts it into cell A1. The big difference here is that is preserves some of the formatting which allows me to breakapart the strings with worksheet formulas. This will work.

Moving on....

IF, the "searchperson" were say, John Smith (we have 4 of them in our company) the return will list these individuals as described in examples.xls. I would want to return that list in a similiar fashion to what we have so far, breakaprt the strings, display them in msgbox. Once the user selects which John Smith they need, I would isolate the CUID and search with that as the "searchperson".

More importantly. How would I go about determining if the search result is for a single individual, multiple individuals, or no matches. I've considered an "IF" statement or "SELECT CASE" to see if a certain named table exists (ie if the search result is a listing then Table"maindetail" is not present but Table"List" is - for no match neither or present)

I think I'm on the right track here.