Consulting

Results 1 to 6 of 6

Thread: How to parse HTML file from VBA code into Excel ?

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    VBAX Regular
    Joined
    Dec 2009
    Posts
    16
    Location

    How to parse HTML file from VBA code into Excel ?

    Hi!

    I would like to do the folowing:

    - Open a HTML file from VBA code ( with a from )
    - Access <td> tags that have a specific attribute ( namely : class="font" width="220" )
    - copy the value of these table cells into Excel cells

    Is it possible ? If yes, then how ?

    Thanks in advance !

  2. #2
    Moderator VBAX Master austenr's Avatar
    Joined
    Sep 2004
    Location
    Maine
    Posts
    2,033
    Location
    You want to import an HTML file into Excel using an Access form is that it? Why use Access at all? You can filter it with Excel.

    Also you need to be more clear on your explaination.
    Peace of mind is found in some of the strangest places.

  3. #3
    VBAX Regular
    Joined
    Dec 2009
    Posts
    16
    Location
    Nope. I wanna access the table that is i want to handle it from the VBA code. That has nothing to do with the Access DB SW.

    So i would like to open the HTML file from code ( like you open a worksheet with Application.Dialogs(xlDialogOpen).Show ), and get the values of the previously mentioned table cells, and copy them into the table in the worksheet. Got it ?

    Its enough for me if i can open the file as a textfile, and then i can search in it, cause i know what tokens i need to search, but if i could open the file as HTML file, and search for the td tags with specified attributed its even better.

  4. #4
    Something like this should get you started. First you need to create the example HTML file used by my code, so go to the VBA Express Excel Help Forum (I'm not allowed to post the link!) in IE and save the page as C:\VBAExpress_Excel_Forum.html using File - Save As - Webpage, HTML only.

    'Requires references to Microsoft Internet Controls and Microsoft HTML Object Library
    
    Sub Extract_TD_text()
    
        Dim URL As String
        Dim IE As InternetExplorer
        Dim HTMLdoc As HTMLDocument
        Dim TDelements As IHTMLElementCollection
        Dim TDelement As HTMLTableCell
        Dim r As Long
        
        'Saved from www vbaexpress com/forum/forumdisplay.php?f=17
        URL = "file://C:\VBAExpress_Excel_Forum.html"
        
        Set IE = New InternetExplorer
        
        With IE
            .navigate URL
            .Visible = True
        
            'Wait for page to load
            While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
            Set HTMLdoc = .document
        End With
        
        Set TDelements = HTMLdoc.getElementsByTagName("TD")
        
        Sheet1.Cells.ClearContents
        
        r = 0
        For Each TDelement In TDelements
            'Look for required TD elements - this check is specific to VBA Express forum - modify as required
            If TDelement.className = "alt2" And TDelement.Align = "center" Then
                Sheet1.Range("A1").Offset(r, 0).Value = TDelement.innerText
                r = r + 1
            End If
        Next
                
    End Sub
    As noted in the code you need to modify it for your particular TD tags, i.e. TDelement.width = "220"

  5. #5

    Example Works Great

    The example you provided works fine but when I try it on an on a company html page (same steps) and edit the search criteria it doesn't list any results.

    Here is a part of the html.
    <td class="graytext" width="50%">DBA Address</td>


    <td width="50%" class="text">7 Dwarfs Ln</td>



    <tr>




    If I search for "graytext" or "test" and remove your second variable it won't find anything. Could there be another issue?

  6. #6
    Quote Originally Posted by tauterouff View Post
    The example you provided works fine but when I try it on an on a company html page (same steps) and edit the search criteria it doesn't list any results.

    Here is a part of the html.
    <td class="graytext" width="50%">DBA Address</td>


    <td width="50%" class="text">7 Dwarfs Ln</td>



    <tr>


    If I search for "graytext" or "test" and remove your second variable it won't find anything. Could there be another issue?




    I have the lines on the html page which are supposed to be in a table format but they are not and there is no separator between each column data except some undefined spaces as shown in below image, so how can i import it as a proper table .i.e. into separate columns in the excel file, also i have multiple such html files so i can import multiple files into one worksheet.
    Attached Images Attached Images

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •