PDA

View Full Version : Please help me find out the data link of the website



lalalada
07-25-2012, 08:33 PM
I've been trying to get data from the website
h ttp://bsr.twse.com.tw/bshtm/
You can enter the code(e.g. 3008) to get related transaction information
I used to use QueryTables.Add with URL= h ttp://bsr.twse.com.tw/bshtm/bsContent.aspx?StartNumber=2498&FocusIndex=All_100
to download the daily data
However, I find it requires the exact page number to get the the information this morning!:doh:
ex. for 3008. there are 20 pages, then the URL would be
h ttp://bsr.twse.com.tw/bshtm/bsContent.aspx?StartNumber=2498&FocusIndex=All_20
Would anyone please help me find a way to determine the page automatically? I'd like to download more than 800 stocks.
Or, would it be possible to download a csv file?
I found the following in the website's source code:
// window.open("bshtm/"+ HiddenField_spDate +"/"+ document.getElementById("hidTASKNO").value+"/"+ document.getElementById("hidTASKNO").value+".csv");

window.open("bsContent.aspx?StartNumber=" + document.getElementById("hidTASKNO").value + "&download=csv");
Thank you very much!
My code:
Sub get_brokerage_listed()
On Error Resume Next
Application.DisplayAlerts = False
Application.ScreenUpdating = False
Dim URL As String
For x = 2 To Range("F65536").End(xlUp).Row
code = CStr(Sheets("DataSource").Cells(x, 6))
sheetname = code + "_broker"
If Sheets(sheetname).exist = True Then
Sheets(sheetname).Delete
End If
Worksheets.Add
ActiveSheet.name = sheetname
webURL = "URL; h ttp://bsr.twse.com.tw/bshtm/bsContent.aspx?StartNumber=" & CStr(code) & "&FocusIndex=All_100"

With ActiveSheet.QueryTables.Add(Connection:=webURL, Destination:=Range("A1"))
.PreserveFormatting = True
.RefreshStyle = xlInsertDeleteCells
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "4,""table2"""
.Refresh BackgroundQuery:=False
End With
Next

Application.ScreenUpdating = True
Application.DisplayAlerts = True
End Sub
p.s. 'cause I'm a newcoming, I put a space in all http

Kenneth Hobs
07-26-2012, 05:07 AM
Welcome to the forum!

Rather than "+" use "&" to concatenate string values. Rather than F65536 use ("F" & Rows.Count).

If you can attach a short example workbook or tell use what the value of one code is, we could test it.

There is some value for hidTaskNo that you probably fill. If I knew that, I could show you how to download the csv file. Getting the url to download is the key part.

lalalada
07-26-2012, 06:28 PM
Welcome to the forum!

Rather than "+" use "&" to concatenate string values. Rather than F65536 use ("F" & Rows.Count).

If you can attach a short example workbook or tell use what the value of one code is, we could test it.

There is some value for hidTaskNo that you probably fill. If I knew that, I could show you how to download the csv file. Getting the url to download is the key part.

Sorry for my recklessness.

I've attached the whole .xls file, and the target code is in the module "get_brokerage_all_listed", if possible, please kindly give some comments on my other codes:)

I actually found out the path for downloading csv file yesterday, unfortunately, the csv file was not complete.
The link is "bsr.twse.com.tw/bshtm/bsContent.aspx?StartNumber=2498&download=csv" (2498 is the code), as shown in module " Update_database" in green. It would be very wonderful if we can fix the csv file, I'm wondering if it's possible that the website changes the csv file when being downloaded?

However, I now focus on reading the source code of the webpage that I can get the parameter needed. I also find out the paragraph I need:<span id="sp_ListCount">51</span>, where "51" stands for the total pages of the data. With this, I would be able to use querytables. The problem is that I don't know how to do the function.

By the way, how using "&" rather than "+" can help improve the process?
Thank you for suggestions!:)

Kenneth Hobs
07-26-2012, 07:50 PM
I could show you a few different ways to get source code. We could probably use the MSIE object methods to get that value though. The problem with the source code method is that you will need to parse the text or xml text. Here is example of the source code method.
Sub tDividendDate()
[a1] = DividendDate
End Sub

'late binding
Public Function DividendDate() As Date
Dim Request As Object, s As String, p As Long, sLen As Integer

On Error Resume Next
Set Request = CreateObject("WinHttp.WinHttpRequest.5.1")
If Request Is Nothing Then
Set Request = CreateObject("WinHttp.WinHttpRequest.5")
End If

Request.Open "GET", "http://finance.yahoo.com/q/ks?s=TOT", False
Request.Send

s = Request.ResponseText
p = InStr(s, "Dividend Date")
Debug.Print s, p
p = InStr(p, s, "yfnc_tabledata1") + 17
DividendDate = CDate(Mid(s, p, InStr(p, s, "<") - p))
End Function
For the concatenation operator, Microsoft explains it better than me. http://msdn.microsoft.com/en-us/library/te2585xw%28v=VS.80%29.aspx

lalalada
07-26-2012, 11:13 PM
Thank you:P!
I tried with this:
Sub listed_page()
code = 3008
URL = "h ttp://bsr.twse.com.tw/bshtm/bsMenu.aspx"
varBody = "__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKMTczNDk4NzY0Mg9kFgICAQ9kFgwCBQ8WAh4JaW5uZXJodG1sBQoyM DEyLzA3LzI2ZAIGDxYCHwAFCDIwMTIwNzI2ZAIIDw8WBh4JRm9udF9Cb2xkZx4EXyFTQgKEEB4J Rm9yZUNvbG9yCj1kZAIKD2QWAgIBDw9kFgIeB09uQ2xpY2sFHGphdmFzY3JpcHQ6YnV0Q2xlYXJ fQ2xpY2soKTtkAgwPDxYGHwFoHwIChBAfAwpHZGQCDg8PFgIeB1Zpc2libGVoZGRk9qMOy%2FJF PEoPIK9L2xMSBAAAAAA%3D&__EVENTVALIDATION=%2FwEWCAKwgZyiAwLjpuXcAwKN4Ij0CwLB5ZfoCQLjk6TKBwKY8en5CwL dkpmPAQL6n7vzC%2BmPnPpPZA51gp%2FYQ%2Fl34QgAAAAA&HiddenField_spDate=&HiddenField_page=PAGE_BS&txtTASKNO=" & CStr(code) & "&hidTASKNO=&btnOK=%E6%9F%A5%E8%A9%A2"
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
With WinHttpReq
.Open "POST", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.Send (varBody)
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.ResponseBody
oStream.SaveToFile ("C:\AA\temp\" & CStr(code) & ".csv")
oStream.Close
End With
Application.Workbooks.Open ("C:\AA\temp\" & CStr(code) & ".csv")
Cells.Find(What:="sp_list").Activate
Page = CStr(Mid(ActiveCell, 126, 2))
Workbooks(CStr(code) & ".csv").Close
Kill ("C:\AA\temp\" & CStr(code) & ".csv")
MsgBox Page
On Error GoTo 0
End Sub

Also
I tried your method:
Sub listed_page()
code = 3008
URL = "h ttp://bsr.twse.com.tw/bshtm/bsMenu.aspx"
varBody = "__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKMTczNDk4NzY0Mg9kFgICAQ9kFgwCBQ8WAh4JaW5uZXJodG1sBQoyM DEyLzA3LzI2ZAIGDxYCHwAFCDIwMTIwNzI2ZAIIDw8WBh4JRm9udF9Cb2xkZx4EXyFTQgKEEB4J Rm9yZUNvbG9yCj1kZAIKD2QWAgIBDw9kFgIeB09uQ2xpY2sFHGphdmFzY3JpcHQ6YnV0Q2xlYXJ fQ2xpY2soKTtkAgwPDxYGHwFoHwIChBAfAwpHZGQCDg8PFgIeB1Zpc2libGVoZGRk9qMOy%2FJF PEoPIK9L2xMSBAAAAAA%3D&__EVENTVALIDATION=%2FwEWCAKwgZyiAwLjpuXcAwKN4Ij0CwLB5ZfoCQLjk6TKBwKY8en5CwL dkpmPAQL6n7vzC%2BmPnPpPZA51gp%2FYQ%2Fl34QgAAAAA&HiddenField_spDate=&HiddenField_page=PAGE_BS&txtTASKNO=" & CStr(code) & "&hidTASKNO=&btnOK=%E6%9F%A5%E8%A9%A2"
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
With WinHttpReq
.Open "POST", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.Send (varBody)

End With
webcode = WinHttpReq.responseText
target = InStr(webcode, "sp_list")
msgbox target
On Error GoTo 0
End Sub

I think yours will be faster
But I dunno why it doesn't work
target = InStr(webcode, "sp_list") will show zero
Would you please explain it to me?

I'm curious how to WebBrowser, would it be faster?