Consulting

Results 1 to 5 of 5

Thread: Not able to get HTML text including formatting

  1. #1

    Not able to get HTML text including formatting

    Hi,

    I am trying to use the following VBA code to get specifc text from a webpage.
    While I am able to do that, I am not able to get the formatting info. The formatting used on the webpage is essential -e.g. the bolded part of the word. How can I acheive that?


    [vba] ...
    IE.navigate "http://dictionary.reference.com/browse/principle"
    Do While IE.Busy: DoEvents: Loop

    Set htmlDoc = IE.document
    Set htmlColl = htmlDoc.getElementsByTagName("SPAN")

    For Each hinput In htmlColl

    If hinput.className = "pron" Then 'the pronouciation of the word

    ocell.Offset(0, 1).Value = hinput.innerText
    ...[/vba]

    Thanks,

    MG.

  2. #2
    After reviewing the HTML, I'd say you'll have to iterate through the pieces of the SPAN and grab each format, and convert that to something Excel can undertand.

    So after you find the Span with the pronunciation, break it down and loop it:

    [VBA] If hinput.className = "pron" Then 'the pronouciation of the word

    s= split(hinput,"</span>")
    redim data(0)
    for sp=lbound(s) to ubound(s)
    redim preserve data(sp)
    'build array of innertext pieces and their corresponding format
    data(sp)= s(sp).className & "|" & s(sp).innertext
    next sp

    'write the array data to the sheet
    'use split columns to separate into two pieces
    'build routine to format pieces accordingly
    'concantanate formatted pieces..

    'etc.

    [/VBA]

  3. #3
    Shred Dude,

    Thanks. I tried your suggestions but I am getting an error at the red line in the following code. This is my complete code. I never used the split function before, however, I observed a strange thing in this subroutine.
    I couln't use the 'run to the cursor' command to directly go to the line : if hinput.classname="pron" line. The program just terminates without any error.

    At the red line, the value of sp stays 0. That means no array is being created by the split function.


    [vba]Public Sub Dictionary()
    Dim ocell As Range
    Dim IE As New SHDocVw.InternetExplorer
    Dim Ticker As String
    Dim htmlDoc As MSHTML.HTMLDocument
    Dim htmlInput As MSHTML.HTMLInputElement
    Dim htmlColl As MSHTML.IHTMLElementCollection
    Dim i, sp As Integer
    Dim s

    Set IE = CreateObject("InternetExplorer.Application")
    IE.Visible = 1


    For Each ocell In Selection

    IE.navigate "http://dictionary.reference.com/browse/" & ocell.Value
    Do While IE.Busy: DoEvents: Loop

    Set htmlDoc = IE.document
    Set htmlColl = htmlDoc.getElementsByTagName("SPAN")


    For Each hinput In htmlColl

    If hinput.className = "pron" Then

    s = split(hinput.outerHTML, "</span>") ' I tried just hinput, as well as innertext, outertext, and innerHTML here.

    ReDim data(0)
    For sp = LBound(s) To UBound(s)
    ReDim Preserve data(sp)
    'build array of innertext pieces and their corresponding format
    data(sp) = s(sp).className & "|" & s(sp).innerText
    Next sp

    i = 0

    For sp = LBound(data) To UBound(data)

    ocell.Offset(0, i).Value = data(sp)

    i = i + 1

    Next sp

    GoTo Loopback

    End If

    Next


    Loopback:
    Next

    End Sub[/vba]

  4. #4
    Try All Caps on the delimiter in the Split function. You HTML may not have "span".

    [VBA] s = split(hinput.outerHTML, "</SPAN>")[/VBA]

  5. #5
    Just took a closer look. This isn't going to work. s(sp) is a String, not an object. You'll need to take your Span object with classname Pron into another HTMLElementCollection of Spans, then iterate through that collection.


    The example of your HTML I saw had multiple Spans within the Span you were getting to.

    Then pull the classname property of each span etc.

    [VBA] For Each hinput In htmlColl

    If hinput.className = "pron" Then

    newColl = hinput.getelementsbytagname("SPAN")
    for each s in newcoll
    'examine the pieces, etc.
    next s
    [/VBA]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •