PDA

View Full Version : extracting from web



anilg0001
05-02-2013, 10:39 PM
i need to extract some below mentioned datas of each doctors

from the following link

suggestadoctor.c o m/doctors_list_al_310_alabama_urolog

y.htm

if it is difficult i can extract the urls of each doctors that is available sheet1

suggestadoctor.c o m/doctor_974_leen_hammontree.h t m

suggestadoctor.c o m/doctor_53685_peternicholas_kolettis.h t m

uggestadoctor.c o m/doctor_56397_thomasedwin_moody.h t m

there are more than 1000 links

from these links, i need the following fields in sheet 2
Dr. Name
Doctor's Hospital
Doctor's Office
Phone Number
Fax Number
Gender
Graduation Year
Location
any vba macro or function works
need help

snb
05-03-2013, 01:26 AM
If you are in need of help I would suggest a doctor closer to your home address.

Kenneth Hobs
05-03-2013, 06:03 AM
Welcome to the forum!

The solution requires more time than I have right now. I suggest that you search the web for examples using the Microsoft Internet Explorer (MSIE) Object. Search this forum for examples as well.

This will get you started on the first part:
Sub MSIE()
Dim lnk As Object, ie As Object, doc As Object, i As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate "http://suggestadoctor.com/doctors_list_al_310_alabama_urology.htm"
Do Until .readyState = 4: DoEvents: Loop
Set doc = ie.document

For Each lnk In doc.Links
i = i + 1
Range("A" & i) = lnk.innerText
Range("B" & i) = lnk
Next lnk
End With
End Sub

anilg0001
05-05-2013, 09:07 PM
Thanks Mr. Kenneth Hobs

But here i get only name and their link address of all doctors but actually i need the following details in excel of each doctors that is my problem

Dr. Name
Doctor's Hospital
Doctor's Office
Phone Number
Fax Number
Gender
Graduation Year
Location

also i need to study to extract from vba which way u can suggest me because i am working in a bpo company so i need to extract various details of various urls.
how can i study it deeply? any suggested links?

please help

Kenneth Hobs
05-06-2013, 11:45 AM
I completed the first step for you. The other step is more complicated. Of course you need the URL from step one to proceed.

As I said, to read up on what you can do, search this forum and others for
CreateObject("InternetExplorer.Application")

anilg0001
05-06-2013, 09:08 PM
The first steps is easy i know
the second step is more complecated i know so that for getting the answer i put the question here.

sassora
05-08-2013, 12:11 AM
This should help, you'll need to tidy the output.

Sub getDocDetails()
Dim xml As Object, html As Object, aTag As Object, tdTag As Object, tdTags As Object
Dim cnt As Long, str As String

Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
Set html = CreateObject("htmlfile")

With xml
.Open "GET", "http://www.suggestadoctor.com/doctors_list_al_310_alabama_urology.htm", False
.Send
html.body.innerhtml = .responseText
End With

For Each aTag In html.Links
If Left(aTag, 13) = "about:doctor_" Then
cnt = cnt + 1
Sheets("Sheet1").Range("A" & cnt) = aTag.innertext
Sheets("Sheet1").Range("B" & cnt) = "http://www.suggestadoctor.com/" & Replace(aTag, "about:", "")
End If
Next aTag

For cnt = 1 To Sheets("Sheet1").Range("B" & Rows.Count).End(xlUp).Row
With xml
.Open "GET", Sheets("Sheet1").Range("B" & cnt), False
.Send
html.body.innerhtml = .responseText
End With

Set tdTags = html.getElementsByTagName("TD")

str = vbNull
For Each tdTag In tdTags
If (tdTag.classname = "tcat" Or tdTag.classname = "leftcol_tbody" Or _
tdTag.classname = "tcat_rightcol" Or tdTag.classname = "rightcol_tbody") Then
str = str & tdTag.innertext & Chr(13)
End If
Next tdTag
Sheets("Sheet2").Range("A" & cnt) = str
Next cnt

End Sub

snb
05-08-2013, 03:51 AM
Basically (you will have to adapt it)

Sub M_snb()
With New WinHttpRequest
.Open "GET", "http://www.suggestadoctor.com/doctors_list_al_310_alabama_urology.htm", False
.Send

For Each it In Filter(Split(Replace(Replace(.ResponseText, "<a", "~<a"), "</a>", "~"), "~"), "'doctor_")
c01=""
.Open "Get", "http://www.suggestadoctor.com/" & Split(it, "'")(1), False
.Send
st = Split(.ResponseText, vbLf)
For j = 1 To 7
c01 = c01 & vbLf & Filter(st, Choose(j, "City :", "ZIP :", "Phone #", "Fax #", "Gender", "Medical School", "Graduation Year"))(0)
Next
Next
End With
End Sub

anilg0001
05-08-2013, 04:30 AM
Thanks Mr sassora.
i dont get the answer because when i am trying to run ur programme i get the
all things in a single cell example

sheet2
A1=

"1 Doctor's Name
Lee N HAMMONTREE MD
[Are you Lee N HAMMONTREE MD? Register now and customize this page!]
Location
Alabama
(City : Birmingham)
(ZIP : 35294)
[More Suggested Doctors From Alabama]
Specialty
Urology

[More Suggested Urology Doctors ..]

[Other Urology Doctors From Alabama..]
Patient Recommendations


1
(0 complaints)

[Suggest This Doctor Now!]
Messages in Doctors Inbox
1
[Send a Message to This Doctor]
Search in Patient Reviews

[Advanced Search]

Lee N HAMMONTREE MD's Special Expertises :
Urology
Recommend Lee N HAMMONTREE MD

Major Activity


Doctor's Hospital
-

Doctor's Office
Office
Office Fot 1105 1530 3rd Ave S Birmingham 35294

Phone Number
Fax # :
(205) 9398939

Gender
Male

Medical School
Univ Of Tn, Memphis, Coll Of Med, Memphis Tn 38163

Graduation Year
1988
(25 years of experience)

Languages


Videos

Memories

Search Doctors within Tags
"


how can i extract from here?
thanks for ur valuable assistance

anilg0001
05-08-2013, 04:34 AM
Thanks Mr.snb
i got an error while running the code which u provide
the error is
User -defined type not defined

yellowcolor shows at here
Sub M_snb()
With New WinHttpRequest

anilg0001
05-08-2013, 09:50 PM
Thanks Mr.snb
i corrected the error User -defined type not defined by setting Microsoft WinHTTP Services but now
i got an error at For statement

variable not defined

For Each it In Filter

sassora
05-08-2013, 10:52 PM
Remove the option explicit statement

anilg0001
05-09-2013, 12:22 AM
Thanks Mr.sassora
'variable not defined' is corrected by declaring the variable
dim it, st, c01, j


but current error is

Run-Time Error '9'
Subscript our of range

snb
05-09-2013, 01:57 AM
Use F8 to go step by step.
You will find that the format of the webpage isn't that consistent, so you will have to create a method to overcome those irregularities.
For instance: you will get an error using 'Phone' and no result using 'Medical School'.
I provided the basics to retieve the information. The finetuning is up to you.

Since you didn't indicate which information you want to retrieve.....

anilg0001
05-09-2013, 09:03 PM
Glad to Thanks for your assistance Mr.snb
i am trying to retriving the following fields

Dr. Name
Doctor's Hospital
Doctor's Office
Phone Number
Fax Number
Gender
Graduation Year
Location

anilg0001
05-09-2013, 09:16 PM
i change the code trying to extract city and zip only (for testing)
For j = 1 To 2
c01 = c01 & vbLf & Filter(st, Choose(j, "City :", "ZIP :"))(0)
Next

for that i did not get any error msgs also i dont get any results? why?
how can i get the data of c01 in excel?


thanks for providing a new knowledge about f8 key use

snb
05-10-2013, 02:54 AM
If you don't know how to write the value in a variable into an Excel sheet you'd better start your first VBA course before proceeding with this code.

anilg0001
05-10-2013, 09:56 PM
Thanks Mr.snb
i have some knowledge in Visual Basic 6.0 which i studied in my pree - degree class
that coding i used in excel for knowing how to extract data from web pagesource to reduce my working time.
means i am not a expert. if u can pls help.
if any thing i told mistaken i am saying sorry.

sassora
05-11-2013, 01:28 AM
I think the point here is that extracting the fields you are looking for into specific cells is not easy due to the way the webpages have been coded.

Becoming more familiar with earning Excel VBA will allow you to proceed in assigning the desired text to individual cells. Depending on your use for this information, it may good to just take the output I've given you and tidy it up yourself (with or without VBA).

anilg0001
05-11-2013, 04:20 AM
can u answer for only two fields?



i change the code trying to extract city and zip only (for testing)
For j = 1 To 2
c01 = c01 & vbLf & Filter(st, Choose(j, "City :", "ZIP :"))(0)
Next

for that i did not get any error msgs also i dont get any results? why?
how can i get the data of c01 in excel?

no one helping me
vbaexpress, excelforum, mr.excel etc.,

harshil042
01-30-2014, 07:39 AM
Hi all,

I am using VBA to navigate through the website - this is the coding

Sub newportcarreg()

Dim lot As String

Set objie = CreateObject("internetExplorer.application")

With objie
.Visible = False
.navigate (not able to post the link but the code to the button is at the bottom - also you could type in newport auction in gogle and you will get the link)

Do While .busy Or _
.readystate <> 4
DoEvents
Loop

Do Until ActiveCell.Value = ""
lot = ActiveCell.Value
Sheet1.Range("a1").End(xlDown).Offset(1, 0).Value = .document.getElementById(lot).innertext
ActiveCell.Offset(1, 0).Select
Loop

End With
'
End Sub


With this code I am able to go to the website and pull the information from the 1st page. But I am unable to navigate to page 2 and pull the data from page 2 because the button on the website is not exactly a button that I can use a click command in VBA. Its a href tag with java scritp - <a href="javascript:__doPostBack('GridView1','Page$2')">2</a>

Can any1 help me and let me how I can navigate to page 2 through VBA?


Thank you
Harshil