Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 23

Thread: Download multi page table to Excel.

  1. #1
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location

    Download multi page table to Excel.

    Not my area of expertise, so would someone be kind enough to write me some code to copy the table in 1 or more parts shown here and paste it into Excel. Each days report would be pasted below the previous one.
    Thanks
    Malcolm
    http://ted.europa.eu/Exec?DataFlow=S...e=1&toconf=yes

    Edit:
    I've just come across a time-out problem. Maybe a bit more thought needed!
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  2. #2
    Knowledge Base Approver
    The King of Overkill! VBAX Master
    Joined
    Jul 2004
    Location
    Rochester, NY
    Posts
    1,727
    Location
    Still need help here?

  3. #3
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Matt,
    Long time no see!
    No urgency on this. I'll repost once I resolve the query parameters
    Thanks
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  4. #4
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Quote Originally Posted by mvidas
    Still need help here?
    Could you give a desired navigation? Which language to select for display; which section to scrape tables from, etc...

    This way I can see if HTTP transfer and local parsing of tables is a viable solution (it would avoid any timeout issues with the site). Stan

  5. #5
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Stan,
    I thought I had closed this! but as you kindly ask:

    Site Address - http://ted.europa.eu/
    Select EN (8th item)
    Country - GB
    Business Opportunities - Contracts
    Download resultant table and subsequent pages

    There's no rush on this, but thanks for any help.
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  6. #6
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    You can avoid the timeout with an HTTP Post (I used the freeware xStandard http.dll since it includes Tidy to ensure well-formed html is returned. I was able to make post but the response file (attached)was not what I expected. I could navigate directly with the link below, but it comes up in French, not English [quelle merde]. However, you should be able to open the htm file I attached and navigate to the correct page. At that point you can run web queries for the tables you need, or write script code (I have a DHTML analyzer I compiled which can assist in this)


    http://ted.europa.eu/Exec?Template=T...LL&StatLang=EN

  7. #7
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Well, if this was an HTTPS site I would expect issues, but the design is either idiotic or brilliant. I've done something similar [web-scraping the yellow pages for address and phone info for all schools, churches, public offices by county/city in North Carolina]. The main point was handling tables with multiple pages.

    I used xPath, gathered everything as a database table, then generated Excel workbooks as needed. Don't know what you expected for an answer, but I can probably help you through contracts[under business opportunities] for GB..... Stan

  8. #8
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Stan
    Web Query just seems to return some page layout data in French.
    I tried the Help example while I had the page open, but still no luck.

    [VBA]Sub EUTest()
    Set shFirstQtr = Workbooks(1).Worksheets(1)
    Set qtQtrResults = shFirstQtr.QueryTables _
    .Add(Connection:="URL;http://ted.europa.eu/Exec?DataFlow=ShowPage.dfl&TableName=Ted_EN&Template=TED/N_result_list.htm&Page=1&toconf=yes", _
    Destination:=shFirstQtr.Cells(1, 1))
    With qtQtrResults
    .WebFormatting = xlNone
    .WebSelectionType = xlSpecifiedTables
    .WebTables = "1,2"
    .Refresh
    End With
    End Sub
    [/VBA]
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  9. #9
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    The url you used gets me nothing. Maybe I missed the point, but is the attached a table of interest; because if so it is multi-paged and I think requires building the data not just performing a web query. Are you needing the data in Excel formatted the same as the HTML? Stan

  10. #10
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Stan
    According to FAQ there is some sort of IP identification which may prevent use of my connection from another PC. I hoped that Excel could get the data from the tables while I had the page open.
    It is the table which is of interest. I would like, if it's in excel to carry out a search to determnine items of interest. I'm not too worried about formatting etc. I can handle that if I can get the data.
    I would stress that this is of a low priority, but it will save our secretary a bit of copying and pasting on a daily basis!
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  11. #11
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    How about:
    1. Navigate to Browse by Location (NUTS)
    2. re-navigate to United Kingdom [all entries]
    3. Parse multiple pages into recordset
    4. Either copy recordset directly into Excel and use as basis for Pivot Table, or store/archive in Access and perform Jet 4.0 SELECT...INTO queries to create Excel sheets as needed.
    Stan [sans login/pw]

  12. #12
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    That sound workable. I'd prefer Excel for simpler manipulation and subsequent processing.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  13. #13
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Well (I explained more in private mail) I can get to here, viz. converting table data to csv as a memvar. So concatenating multiple pages, saving the memvar as a csv file and loading into excel is doable. Stan

    P.S. meant to use | as delimiter, instaed of ,

  14. #14
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Thanks Stan
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  15. #15
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Quote Originally Posted by mdmackillop
    Thanks Stan
    maybe...maybe not... I have attached a | separated file which parsed all contracts for UK [324 as of last count]. I see why they call it (NUTS) Anyway, if this is something you can work with in Excel, I think my script will work and avoid the Timeouts. I originally used HTTP transfer thinking the URL constructed a sessionid, but it seems [I tested 8 times in a row] that the IE Object will suffice.

    1 caveat: I used Winbatch, since the task required parsing text, I'm not that jazzed w/the filesystem object, all the Dim's and Set's involved - however, the IE part is 99% compatible with VBA, so you can use your own parsing functions. If the csv works, I can set out the steps in 99% compatible pseudo-code. Stan

  16. #16
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Good thing I ran it again (as the site moved from week 6 to week 7, and that threw a curve at final parsing). I've attached week 7's data as csv and Excel - notice what happens to some dates, which I assume is an easy fix for you.

    If any of this looks doable for you, I'll go over the web-scraping 101 code, try not to bore you with things you already know... otherwise I'll fade away from this thread for fear of being annoying.

    Stan

  17. #17
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Stan,
    I managed a quick look at what you did and played around with excel formatting. I can readily extract from the csv for 90%. The other 10% are multiline descriptions where the second line in effect deletes the following item. (see item 14 of your first results file). I'll try and look at your last post tonight.
    No fear of your being annoying; I'm just a bit short of time just now.
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  18. #18
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Quote Originally Posted by mdmackillop
    I'm just a bit short of time just now.
    Malcolm
    Well, like you said...low priority... the code below is the best I can do for brute force processing of your data. Since few have responded to this thread, I'm thinking there is no easy web-query or 10 lines of code. Usually, with web-scraping I like to run my scripts independent so I did my best to modify my 17 lines of script code as a sub. I'm sure it can be optimized, but I'd like to think at least I did the DHTML investigations to obtain the correct references. As in my last post, I get a file that imports to Excel [as I attached ion that post] Stan

    Sub GetTed()
      'Reference MS Internet Controls
      Dim IE As InternetExplorer
      Set IE = New InternetExplorer
      'first naviagte to home page
       IE.Visible = True
       IE.Navigate "http://ted.europa.eu/"
     
      'Wait until page is loaded. not recommended, probably need to
      'exit after 30 seconds, I'd write it as a separate function
      Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
      'set up a var go which assumes failure for all navigations
      go=0
      'make sure it's in English - or choose whatever language by
      'specifying part of the innertext
      For i= 0 To IE.Document.GetElementsByTagName("A").length-1
         s= IE.Document.GetElementsByTagName("A").Item(i)
         If s.GetAttribute("innerText")<>"" AND Instr(s.InnerText,"Official Journal")  Then
        s.click()
        go=1
          Exit For
       End If
      Next
     
      If go=0 Then 
          goto end
      End If
      Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
      'Now move to NUTS
      go=0
      For i= 0 To IE.Document.GetElementsByTagName("A").length-1
         s= IE.Document.GetElementsByTagName("A").Item(i)
         If s.GetAttribute("innerText")<>"" AND Instr(s.InnerText,"(NUTS)")  Then
            s.click()
          go=1
          Exit For
         End If
      Next
      End If
      If go=0 Then 
          goto end
      End If
      Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
      go=0
      'this is the tricky part, the next page has links to tables for all countries
      'but does not include an innertext to test with, I noticed that GB suddenly becomes
      'UK - which is the last in the list and [hoprefully] always -10 from the last
      'anchor
      n = oIE.Document.GetElementsByTagName("A").length-10
      If n>0 Then 
         oIE.Document.GetElementsByTagName("A").Item(n).click
         go=1
      Else
         goto end
      End If
     'bring First Table
      Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
      go=0
      cTable = "Ref,Details,Country,Start,End" & vbcrlf
      If IE.Document.Body.GetElementsByTagName("TABLE").Item(39)
         cTable = cTable & gettable(IE.Document.Body.GetElementsByTagName("TABLE").Item(39).InnerText)
       go=1
       x=2
       'loop through subsequent tables if they exist
       While go=1
          n=oIE.Document.GetElementsByTagName("A").length-1
            For i= 0 To n
            If oIE.Document.GetElementsByTagName("A").Item(i) Then
               s= oIE.Document.GetElementsByTagName("A").Item(i)
             If s.GetAttribute("innerText")<>"" AND s.InnerText=x Then
                  s.click()
            Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
            cTable = cTable & gettable(IE.Document.Body.GetElementsByTagName("TABLE").Item(39).InnerText)
          x=x+1
               Else
            go=0
               End if
            End if
       Loop
      End If
      'at this point you should have a csv memvar which you can outpout to a file or
      'position back into a worksheet... your choice
    :end
    Set IE = Nothing
    End Sub
    Public Function gettable(c AS String)
    cTxt = cTxt=Mid(c,Instr(c,"Deadline",8))
    cTxt=Replace(cTxt,",",";")
    cTxt=Replace(cTxt,"  ",",")
    cTxt=Replace(cTxt,",,",",")
    cTxt=Replace(cTxt,", ",",")
    Return cTxt
    End Function

  19. #19
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Stan
    Thanks for the code.
    I've had to make a few amandments to get it to run in Excel. I've some glitches to work out with the additional pages. There's also a problem with Permissions. I can step through the code, but get permission denied errors when trying to run it. Debugging and continuing the code works OK, so maybe I just need to build in a Wait.
    [VBA]
    Sub GetTed()
    'Reference MS Internet Controls
    Dim IE As InternetExplorer
    Dim s As Object
    Set IE = New InternetExplorer
    'first naviagte to home page
    IE.Visible = True
    Application.StatusBar = "Opening Internet Explorer..."
    IE.Navigate "http://ted.europa.eu/"

    'Wait until page is loaded. not recommended, probably need to
    'exit after 30 seconds, I'd write it as a separate function
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    'set up a var go which assumes failure for all navigations
    go = 0

    Application.StatusBar = "Opening English version..."
    'make sure it's in English - or choose whatever language by
    'specifying part of the innertext
    For i = 0 To IE.Document.GetElementsByTagName("A").Length - 1
    Set s = IE.Document.GetElementsByTagName("A").Item(i)
    txt = s.GetAttribute("innerText")
    If txt <> "" Then
    If InStr(1, txt, "Official Journal") > 0 Then
    'MsgBox txt
    s.Click
    go = 1
    Exit For
    End If
    End If
    Next
    Set s = Nothing

    Application.StatusBar = "Going to Location (NUTS)..."
    If go = 0 Then
    GoTo Endd
    End If
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    'Now move to NUTS
    go = 0
    For i = 0 To IE.Document.GetElementsByTagName("A").Length - 1
    Set s = IE.Document.GetElementsByTagName("A").Item(i)
    txt = IE.Document.GetElementsByTagName("A").Item(i).GetAttribute("innerText")
    If txt <> "" Then
    If InStr(1, txt, "(NUTS)") > 0 Then
    'MsgBox txt
    s.Click
    go = 1
    Exit For
    End If
    End If
    Next
    Set s = Nothing
    If go = 0 Then
    GoTo Endd
    End If
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    go = 0
    'this is the tricky part, the next page has links to tables for all countries
    'but does not include an innertext to test with, I noticed that GB suddenly becomes
    'UK - which is the last in the list and [hopefully] always -10 from the last
    'anchor

    Application.StatusBar = "Going to UK page..."
    n = IE.Document.GetElementsByTagName("A").Length - 10
    If n > 0 Then
    IE.Document.GetElementsByTagName("A").Item(n).Click
    go = 1
    Else
    GoTo Endd
    End If
    'bring First Table
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    go = 0

    Application.StatusBar = "Create table from first page..."
    ctable = "Ref,Details,Country,Start,End" & vbCrLf
    'If IE.Document.Body.GetElementsByTagName("TABLE").Item(39)
    ctable = ctable & IE.Document.Body.GetElementsByTagName("TABLE").Item(39).innertext
    Debug.Print ctable
    ctable = ctable & IE.Document.Body.GetElementsByTagName("TABLE").Item(39).innertext
    go = 1
    x = 2
    'loop through subsequent tables if they exist

    Application.StatusBar = "Getting subsequent tables..."
    While go = 1
    n = IE.Document.GetElementsByTagName("A").Length - 1
    For i = 0 To n
    'If IE.Document.GetElementsByTagName("A").Item(i) Then
    Set s = IE.Document.GetElementsByTagName("A").Item(i)
    If IE.Document.GetElementsByTagName("A").Item(i).GetAttribute("innerText") _
    <> "" And IE.Document.GetElementsByTagName("A").Item(i).GetAttribute("innerText") = x Then
    MsgBox s 's.click()
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    ctable = ctable & GetTable(IE.Document.Body.GetElementsByTagName("TABLE").Item(39).innertext)
    x = x + 1
    Else
    go = 0
    End If
    'End If
    Next
    Wend
    'at this point you should have a csv memvar which you can outpout to a file or
    'position back into a worksheet... your choice
    Endd:
    Application.StatusBar = ""
    Set IE = Nothing
    End Sub
    Public Function GetTable(c As String)
    cTxt = Mid(c, InStr(c, "Deadline", 8))
    cTxt = Replace(cTxt, ",", ";")
    cTxt = Replace(cTxt, " ", ",")
    cTxt = Replace(cTxt, ",,", ",")
    cTxt = Replace(cTxt, ", ", ",")
    GetTable cTxt
    End Function
    [/VBA]

    I've also tried accessing Contracts (which is the bit required) from alternative controls. Still a problem with Permissions, but I'm getting there with the subsequent pages. I'm looking to remove some of the looping once I identify the proper controls.

    [VBA]
    Sub GetTed2()
    'Reference MS Internet Controls
    Dim IE As InternetExplorer, s As Object
    'first naviagte to home page
    Set IE = New InternetExplorer
    IE.Visible = False
    Application.StatusBar = "Opening Interner Explorer..."
    IE.Navigate "http://ted.europa.eu/Exec?DataFlow=h...GB&StatLang=EN"

    Application.StatusBar = "Opening 'Contract Notice'..."
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    tmp = IE.Document.GetElementsByTagName("A").Length - 1
    For i = 0 To tmp 'IE.Document.GetElementsByTagName("A").Length - 1
    Set s = IE.Document.GetElementsByTagName("A").Item(i)
    'Select Contract list
    If InStr(1, s, "Generatequery('GB', '3')") > 0 Then
    s.Click
    Exit For
    End If
    Next

    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop

    Application.StatusBar = "Getting first table..."
    'Get first table
    ctable = IE.Document.Body.GetElementsByTagName("TABLE").Item(38).innertext
    'Split table into rows
    tmp = Split(ctable, Chr(13))
    Rw = 0

    Application.StatusBar = "Split table and paste into rows..."
    For Each t In tmp
    Rw = Rw + 1
    Sheets(1).Cells(Rw, 1) = t
    Next

    Application.StatusBar = "Get next page..."
    tmp = IE.Document.GetElementsByTagName("A").Length - 1
    k = 2
    For i = 0 To tmp
    Set s = IE.Document.GetElementsByTagName("A").Item(i)
    If InStr(1, s, "gotopageres('" & k & "')") > 0 Then
    s.Click
    k = k + 1

    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    Application.StatusBar = "Getting next table..."
    ctable = IE.Document.Body.GetElementsByTagName("TABLE").Item(38).innertext
    tmp = Split(ctable, Chr(13))
    Rw = Cells(Rows.Count, 1).End(xlUp).Row
    Application.StatusBar = "Splitting table into rows..."
    For Each t In tmp
    Rw = Rw + 1
    Sheets(1).Cells(Rw, 1) = t
    Next
    End If
    Next
    Set IE = Nothing
    Application.StatusBar = ""
    Exit Sub
    End Sub
    [/VBA]

    Thanks very much for the assistance, this is all new to me!
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  20. #20
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    I'm sure once you get this ironed out, future web-scrapes will be a breeze. I appreciate your patience with the way I kludged my original script. I have a feeling a careful use of execscript() can short-circuit some of the page-2-page navigation and looping. I have a DHTML analyzer exe I'm in the process of upgrading. You select the objects you want interrogated for a particular page, and it breaks them down into an output text file. Stan

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •