PDA

View Full Version : Solved: This is probably a reach of question..ASP



grichey
05-13-2008, 09:18 AM
This website: http://mentalhealth.samhsa.gov/databases/facility-search.aspx? allows you to search by state.

I am trying to get a comprehensive list for the entire country.

This website: http://findtreatment.samhsa.gov/list_search.htm which is similar but for substance abuse instead of mental health allows you to download an entire list in one big pull as opposed to copying and pasting hundreds or possibly a thousand 10 result pages which is looking like the only way to handle the top search.

Is there any ASP lingo that would force IExplorer to just give me the entire database I tried using % % instead of AL and Alabama (ex.:http://mentalhealth.samhsa.gov/databases/facility-search.aspx?state=AL ) but that didn't work.

Ideas appreciated.

Oorang
05-13-2008, 09:48 AM
Hi G,
You want to be careful about trying the approach you did. Depending on who ask, and what data was revealed, that could be construed to be an injection attack. That said, given that all the data you are retrieving is already public anyway, I suggest you set a reference to ShDocVw and MSHTML and automate the browser to hit all the states in order, cycle through the pages and pull the data in order. But this can be a fairly laborious as you have to learn the site code etc. Then parse the data into flat file format.
If you need help writing your own code post back to this thread, but I doubt anyone is going to want to do the whole thing for you ;)

grichey
05-13-2008, 09:55 AM
hmm thanks for the info. Any place you point me if I wanted to figure out how to do this?? I'm def not an expert, but I do have some knowledge though admittedly most of it is in vba of late and some c++

I guess one question is, how time consuming will it be for me to figure out how to do what you've suggested vs how long does it take to copy and paste and wait on loading time for roughly 1500 pages of 10 matches each...

Oorang
05-13-2008, 12:28 PM
That is the perfect question to ask yourself before embarking on a screen scrape. If you have never built a screenscrape before, figure a little longer. For myself to do a good one, that output typed-data that I knew was reliable. I'd figure for a couple hours. 2 if things go smooth. 4 if they don't. Plus actual pull-time.
A final consideration when screen-scraping is to read terms and conditions of a site (if there are any) and make sure it's not a violation. You should also consider putting a throttler on it to prevent you hitting them with too much load too fast. A polite scraper will also read through the robots.txt file and comply. This site has one here: http://mentalhealth.samhsa.gov/robots.txt

Web scraping is a very common task, and VBA isn't always the fastest/best tool for it. It's doable, but seeing as how it is a government site/service you might just email them and ask if they can provide a full file. You'd be suprised how often that works.

grichey
05-13-2008, 01:18 PM
Thanks for the info. And yea, the 'just asking' is already in progress. Never know how long that may take though.