Consulting

Results 1 to 12 of 12

Thread: Save web page source as text file

  1. #1
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location

    Save web page source as text file

    With IE7 I can go Page/View Source to get the raw data from which I can extract desired data. Can some kind person create some code which would allow me to automate this for the active web page, saving to a specified location? Overwriting previous file doesn't matter.
    Any differences for IE8?
    Regards
    MD
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  2. #2
    Knowledge Base Approver VBAX Master Oorang's Avatar
    Joined
    Jan 2007
    Posts
    1,135
    Location
    Doing it for the active web page is a little tricky if you didn't create the instance of IE yourself. It that something that can be worked around?
    If so then either of these work with IE8. If not post back and we'll work on it

    [vba]Option Explicit
    Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

    Public Sub Example1()
    DownloadFile "http://www.google.com", "c:\Test\goog.txt"
    Shell "notepad.exe ""c:\Test\goog.txt""", vbMaximizedFocus
    End Sub

    Public Sub Example2()
    'Requires reference to Microsoft Internet Controls
    Dim ie As SHDocVw.InternetExplorer
    'Microsoft HTML Object Library
    Dim doc As MSHTML.HTMLDocument
    Set ie = New SHDocVw.InternetExplorer
    ie.Navigate "www.google.com"
    Do Until ie.ReadyState = READYSTATE_COMPLETE
    DoEvents
    Loop
    Do While doc Is Nothing
    Set doc = ie.Document
    Loop
    MsgBox doc.body.innerHTML
    ie.Quit
    End Sub

    Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
    'Thanks Mentalis
    Dim lngRetVal As Long
    lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
    If lngRetVal = 0 Then DownloadFile = True
    End Function
    [/vba]
    Cordially,
    Aaron



    Keep Our Board Clean!
    • Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
    • Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.

  3. #3
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Aaron,
    I'll need to wait till I'm at work to test this.
    A bit of background. Our surveyors log in to a web site and fill in an on-line form. Some of this data, we wish to store in our own database for analysis purposes. I now have the rest in place to extract the info from a text file and export it, so this automation is the icing.
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  4. #4
    Knowledge Base Approver VBAX Master Oorang's Avatar
    Joined
    Jan 2007
    Posts
    1,135
    Location
    Is it a site your area has control over, or are we making due without altering the site?
    Cordially,
    Aaron



    Keep Our Board Clean!
    • Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
    • Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.

  5. #5
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    I'm not making any changes to the site. Data entry will be complete before this is run. Probably we will have someone reopening previously submitted data and reading the text file.
    I've attached the Excel part for info.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  6. #6
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Aaron,
    I'm having limited success here. Although I previously managed to save the text version manually, I now seem only to see a few lines. It appears as though the page is in two frames, and I need the second one.
    I've been using Example 1 in my attempts.
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  7. #7
    Knowledge Base Approver VBAX Master Oorang's Avatar
    Joined
    Jan 2007
    Posts
    1,135
    Location

    Scope Creeper!!!!!

    This ought to do it

    I'm sure there is room for refinement, but I tried to code in the spirit of "just get something that works".

    Edit: I just couldn't let it rest. Reworked it to use the MSHTML library. Should resolve most of the string parsing issues. It's worth noting that depending on how much of the HTML you actually want, it could be beneficial to just just parse what it in the doc object and stick it in Excel rather than downloading the file. You can use the CreateDocumentFromURL on the web URL, but I wanted to work within the framework of your scenario.

    [vba]Option Explicit

    Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

    Public Sub ImportSourceCode()
    'Code written by Aaron Bush 06/10/2009 - Free for public use.
    'Credit: It's nice but whatever
    'Microsoft HTML Object Library for this to work.
    'You will want to set a reference to C:\Windows\system32\mshtml.tlb
    Const strURL_c As String = _
    "http://www.w3schools.com/TAGS/tryit.asp?filename=tryhtml_iframe"
    Const strSaveFolder_c As String = "C:\Test\HTMLDump\"
    Dim strLocalPath As String
    Dim strURLs() As String
    Dim lngIndx As Long
    Dim lngUprBnd As Long
    ClearFolder strSaveFolder_c
    ReDim strURLs(lngIndx) As String
    strURLs(lngIndx) = strURL_c
    Do
    strLocalPath = DownloadFile(strURLs(lngIndx), strSaveFolder_c)
    lngUprBnd = AppendArray(strURLs, GetIFrameURLS("file:///" & _
    strLocalPath, strURLs(lngIndx)))
    lngIndx = lngIndx + 1&
    Loop Until lngIndx > lngUprBnd
    ImportAll strSaveFolder_c, Sheet1
    ClearFolder strSaveFolder_c
    End Sub

    Private Sub ClearFolder(ByVal folder As String)
    Dim strPath As String
    folder = FixFolderName(folder)
    strPath = Dir(folder)
    Do While LenB(strPath)
    Kill folder & strPath
    strPath = Dir
    Loop
    End Sub

    Private Sub ImportAll(ByVal folder As String, ByVal ws As Excel.Worksheet)
    Dim strPath As String
    ws.UsedRange.Delete
    folder = FixFolderName(folder)
    strPath = Dir(folder)
    Do While LenB(strPath)
    With ws.QueryTables.Add("TEXT;" & folder & strPath, _
    ws.UsedRange.SpecialCells(xlCellTypeLastCell).Offset(1, 0))
    .TextFileParseType = xlFixedWidth
    .TextFileColumnDataTypes = Array(1)
    .TextFileFixedColumnWidths = Array(1024)
    .Refresh False
    End With
    strPath = Dir
    Loop
    End Sub

    Private Function DownloadFile(ByVal URL As String, localFolder As String) As _
    String
    'Thanks Mentalis
    Dim lngRetVal As Long
    Dim strLocalFile As String
    strLocalFile = FixFolderName(localFolder) & GetGUID & ".html"
    lngRetVal = URLDownloadToFileA(0, URL, strLocalFile, 0, 0)
    If lngRetVal Then
    Err.Raise Err.LastDllError, , "Download failed."
    End If
    DownloadFile = strLocalFile
    End Function

    Private Function FixFolderName(ByVal folder As String) As String
    Dim strRtnVal As String
    If AscW(Right$(folder, 1&)) = 92& Then
    strRtnVal = folder
    Else
    strRtnVal = folder & "\"
    End If
    FixFolderName = strRtnVal
    End Function

    Private Function GetGUID() As String
    GetGUID = Mid$(CreateObject("Scriptlet.TypeLib").GUID, 2&, 36&)
    End Function

    Private Function AppendArray(ByRef appendTo() As String, ByRef appendFrom() As _
    String) As Long
    Const lngEmpty_c As Long = -1&
    Dim lngUprBndTo As Long
    Dim lngUprBndFrom As Long
    Dim lngUprBndNew As Long
    Dim lngIndxTo As Long
    Dim lngIndxFrom As Long
    lngUprBndTo = SafeUBound(appendTo)
    lngUprBndFrom = SafeUBound(appendFrom)
    If lngUprBndFrom <> lngEmpty_c Then
    lngUprBndNew = lngUprBndTo + lngUprBndFrom + 1&
    ReDim Preserve appendTo(lngUprBndNew)
    For lngIndxTo = lngUprBndTo + 1 To lngUprBndNew
    appendTo(lngIndxTo) = appendFrom(lngIndxFrom)
    lngIndxFrom = lngIndxFrom + 1&
    Next
    Else
    lngUprBndNew = lngUprBndTo
    End If
    AppendArray = lngUprBndNew
    End Function

    Private Function SafeUBound(ByRef value() As String) As Long
    Dim lngRtnVal As Long
    On Error Resume Next
    lngRtnVal = UBound(value)
    If Err Then
    lngRtnVal = -1
    End If
    SafeUBound = lngRtnVal
    End Function



    Private Function GetIFrameURLS(ByVal fileURL As String, ByVal remoteURL As _
    String) As String()
    'Note: Assumes fileURL is prefixed with "file:///".
    Dim docNew As New MSHTML.HTMLDocument '<-Must be dimmed as new to work.
    Dim doc As MSHTML.HTMLDocument
    Dim iFrame As MSHTML.HTMLIFrame
    Dim objElmnt As Object
    Dim lngIndx As Long
    Dim strRtnVal() As String
    Dim strBaseURL As String
    Dim strDomain As String
    If Left$(fileURL, 8&) <> "file:///" Then
    Err.Raise vbObjectError, , _
    "fileURL parameter must be prefixed with ""file:///""."
    End If
    strDomain = GetDomain(remoteURL)
    strBaseURL = Left$(remoteURL, InStrRev(remoteURL, "/"))
    Set doc = docNew.createDocumentFromUrl(fileURL, vbNullString)
    Do Until doc.readyState = "complete"
    DoEvents
    Loop
    For Each objElmnt In doc.all
    'Don't try to use the typeof operator here, it will yield false positives.
    If TypeName(objElmnt) = "HTMLIFrame" Then
    Set iFrame = objElmnt
    If LenB(iFrame.src) Then
    ReDim Preserve strRtnVal(lngIndx) As String
    If InStrB(iFrame.src, "//") = 0& Then
    If AscW(iFrame.src) = 47& Then
    strRtnVal(lngIndx) = strDomain & iFrame.src
    Else
    strRtnVal(lngIndx) = strBaseURL & iFrame.src
    End If
    Else
    strRtnVal(lngIndx) = iFrame.src
    End If
    lngIndx = lngIndx + 1&
    End If
    End If
    Next
    GetIFrameURLS = strRtnVal
    End Function

    Private Function GetDomain(ByVal URL As String) As String
    Dim lngDPos As Long
    Dim strRtnVal As String
    lngDPos = InStr(URL, ":")
    If lngDPos Then
    Select Case LCase$(Left$(URL, lngDPos))
    Case "http:"
    lngDPos = InStr(8&, URL, "/")
    Case "https", "file"
    lngDPos = InStr(9&, URL, "/")
    Case Else
    Err.Raise vbObjectError, "Unrecognized URL scheme"
    End Select
    If lngDPos Then
    strRtnVal = Left$(URL, lngDPos - 1&)
    Else
    strRtnVal = URL
    End If
    Else
    Err.Raise vbObjectError, "Unrecognized URL scheme"
    End If
    GetDomain = strRtnVal
    End Function
    [/vba]
    Cordially,
    Aaron



    Keep Our Board Clean!
    • Please Mark your thread "Solved" if you get an acceptable response (under thread tools).
    • Enclose your code in VBA tags then it will be formatted as per the VBIDE to improve readability.

  8. #8
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Aaron,
    For whatever reason it is not getting to the Data level. This is a fairly secure site, and possibly the data is not exposed via a visible page nasme. Does that make sense? I could email you the mht file if that would help?
    I do get all the data in the MHT. Would that be an easier way to go? I've had a look for SaveAs MHT, but no success as yet! Whilst line-breaks mess up the data a bit, I'm sure I can extract what I need from that format.
    Regards
    Malcolm

    BTW, a minor bug in your code
    [VBA]

    Private Function GetDomain(ByVal URL As String) As String
    Dim lngDPos As Long
    Dim strRtnVal As String

    'My target site is 'https://link etc.
    lngDPos = InStr(URL, ":")
    If lngDPos Then
    Select Case LCase$(Left$(URL, lngDPos))
    Case "http:"
    lngDPos = InStr(8&, URL, "/")
    'Should these contain the colon?
    Case "https", "file"
    lngDPos = InStr(9&, URL, "/")
    Case Else
    Err.Raise vbObjectError, "Unrecognized URL scheme"
    End Select
    [/VBA]
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  9. #9
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    If you can find a way to navigate to the data level, I find Logparser the easiest way to stream to a file, especially because you can specify interesting WHERE clauses. For example, here is this thread. Stan

    [vba]
    cURL="http://www.vbaexpress.com/forum/showthread.php?t=27050"
    cFile="c:\temp\url.txt"
    cSQL = "SELECT text INTO " & cFile & " from " & cURL"
    oLog = CreateObject("MSUtil.LogQuery")
    oInput = CreateObject("MSUtil.LogQuery.TextLineInputFormat")
    oOut = CreateObject("MSUtil.LogQuery.NativeOutputFormat.1")
    oOut.Direct = 1
    oLog.ExecuteBatch(cSQL, oInput, oOut)
    oInput=Nothing
    oOut=Nothing
    oLog=Nothing
    [/vba]

  10. #10
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Stan,
    I'll have a look at this at work to test it on the secure site I'm working with
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  11. #11
    VBAX Master stanl's Avatar
    Joined
    Jan 2005
    Posts
    1,141
    Location
    Quote Originally Posted by mdmackillop
    Hi Stan,
    I'll have a look at this at work to test it on the secure site I'm working with
    Regards
    Malcolm
    I wasn't reallly sure from your post if you wanted to just get the source, or wanted to obtain what the users filled in forms as. Also, in IE8 (and probably IE7) 'Tabs' are a different issue and best programmed through InternetExplorer.Application.1

    Stan

  12. #12

    Another way?

    Just reading through this old post and was wondeirng if a simpler method would suffice? If you just want to capture the HTML Source of a known URL into a Text file, would something like this work for you?


    [VBA]Public Sub saveSource()
    Dim myFile As String
    Dim strURL As String
    Dim myText As String

    strURL = "http://www.google.com"

    myFile = Application.DefaultFilePath & "\" & _
    Replace(strURL, "http://", "") & " - SOURCE.txt"

    With CreateObject("Microsoft.XMLHTTP")
    .Open "GET", strURL, False
    .send
    myText = .responsetext
    End With

    With CreateObject("Scripting.FileSystemObject").CreateTextFile(myFile, True)
    .Write (myText)
    .Close
    End With

    End Sub[/VBA]

    It sounded like you'd know how to navigate to the correct webpage to retrieve the previously inputted form data. Wasn't sure if that source would reveal the Frame contents for you or not.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •