PDA

View Full Version : OpenAsTextStrem - only want a certain range



theta
02-09-2012, 03:57 AM
I have a large text file that I read into rng1. This is a large .htm file with alot of fluff that I don't need.

I only want the elements (table) to (/table) - how can I strip this element from the .htm file, and discard everything else.

Currently OpenAsTextStream into rng1

Using LIKE, or some other method?

mohanvijay
02-09-2012, 04:47 AM
could you post the sample file?

theta
02-09-2012, 04:53 AM
Need to get from <table> (inclusive) through the end of </table>



<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 12">
<link rel=File-List href="Book3_files/filelist.xml">

</head>

<body>
<!--[if !excel]>&nbsp;&nbsp;<![endif]-->
<!--The following information was generated by Microsoft Office Excel's Publish
as Web Page wizard.-->
<!--If the same item is republished from Excel, all information between the DIV
tags will be replaced.-->
<!----------------------------->
<!--START OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD -->
<!----------------------------->

<div id="Book3_4928" align=center x:publishsource="Excel">

<table border=0 cellpadding=0 cellspacing=0 width=320 style='border-collapse:
collapse;table-layout:fixed;width:240pt'>
<col width=64 span=5 style='width:48pt'>
<tr height=21 style='height:15.75pt'>
<td height=21 class=xl644928 width=64 style='height:15.75pt;width:48pt'>Name</td>
<td class=xl644928 width=64 style='width:48pt'>Student<span style='display:
none'> information</span></td>
<td class=xl644928 colspan=2 width=128 style='width:96pt'>Other Info</td>
<td class=xl154928 width=64 style='width:48pt'></td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Ron</td>
<td class=xl654928>Info about <span style='display:none'>Ron 1</span></td>
<td class=xl654928 colspan=3>Other Info about Ron 1</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Dave</td>
<td class=xl654928>Info about <span style='display:none'>Dave 1</span></td>
<td class=xl654928 colspan=3>Other Info about Dave 1</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Tom</td>
<td class=xl654928>Info about <span style='display:none'>Tom 1</span></td>
<td class=xl654928 colspan=3>Other Info about Tom 1</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Ron</td>
<td class=xl654928>Info about <span style='display:none'>Ron 2</span></td>
<td class=xl654928 colspan=3>Other Info about Ron 2</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Dave</td>
<td class=xl654928>Info about <span style='display:none'>Dave 2</span></td>
<td class=xl654928 colspan=3>Other Info about Dave 2</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Tom</td>
<td class=xl654928>Info about <span style='display:none'>Tom 2</span></td>
<td class=xl654928 colspan=3>Other Info about Tom 2</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Ron</td>
<td class=xl654928>Info about <span style='display:none'>Ron 3</span></td>
<td class=xl654928 colspan=3>Other Info about Ron 3</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Dave</td>
<td class=xl654928>Info about <span style='display:none'>Dave 3</span></td>
<td class=xl654928 colspan=3>Other Info about Dave 3</td>
</tr>
<tr height=20 style='height:15.0pt'>
<td height=20 class=xl654928 style='height:15.0pt'>Tom</td>
<td class=xl654928>Info about <span style='display:none'>Tom 3</span></td>
<td class=xl654928 colspan=3>Other Info about Tom 3</td>
</tr>
<![if supportMisalignedColumns]>
<tr height=0 style='display:none'>
<td width=64 style='width:48pt'></td>
<td width=64 style='width:48pt'></td>
<td width=64 style='width:48pt'></td>
<td width=64 style='width:48pt'></td>
<td width=64 style='width:48pt'></td>
</tr>
<![endif]>
</table>

</div>


<!----------------------------->
<!--END OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD-->
<!----------------------------->
</body>

</html>

Kenneth Hobs
02-09-2012, 07:07 AM
This has more than you need. Try pRevTags().

Sub Test_pGoog()
Dim s As String, sDistance As String, sDuration As String
s = "<duration>" & vbCrLf & _
"<value>16</value>" & vbCrLf & _
"<text>1 min</text>" & vbCrLf & _
"</duration>" & vbCrLf & _
"<html_instructions>Make a &lt;b&gt;U-turn&lt;/b&gt; at &lt;b&gt;Palm Dr&lt;/b&gt;&lt;div style=&quot;font-size:0.9em&quot;&gt;Destination will be on the right&lt;/div&gt;</html_instructions>" & vbCrLf & _
"<distance>" & vbCrLf & _
"<value>44</value>" & vbCrLf & _
"<text>144 ft/text>" & vbCrLf & _
"</distance>" & vbCrLf & _
"</step>" & vbCrLf & _
"<duration>" & vbCrLf & _
"<value>164061</value>" & vbCrLf & _
"<text>1 day 22 hours</text>" & vbCrLf & _
"</duration>" & vbCrLf & _
"<distance>" & vbCrLf & _
"<value>4553964</value>" & vbCrLf & _
"<text>2,830 mi</text>" & vbCrLf & _
"</distance>" & vbCrLf & _
"<start_location>"
sDistance = pGoog("distance", s)
'sDuration = pGoog("duration", s)
MsgBox sDistance
End Sub

Private Function pGoog(strSearch As String, strHTML As String) As String
Dim s As String
s = pRevTags(strSearch, strHTML)
pGoog = pRevTags("text", s)
End Function

Private Function pRevTags(strSearch As String, strHTML As String) As String
Dim s As String, p1 As Long, p2 As Long, lss As Integer
p1 = InStrRev(strHTML, "<" & strSearch & ">")
If p1 = 0 Then
pRevTags = "Not Found"
Exit Function
End If
p2 = InStrRev(strHTML, "</" & strSearch & ">")
lss = Len(strSearch)
s = Mid(strHTML, p1 + lss + 2, p2 - p1 - 2 - lss)
pRevTags = s
End Function


'Old method:
Private Function parseGoog(strSearch As String, strHTML As String) As String
strSearch = strSearch & ":'"
If InStr(1, strHTML, strSearch) = 0 Then parseGoog = "Not Found": Exit Function
parseGoog = Mid(strHTML, InStr(1, strHTML, strSearch) + Len(strSearch))
parseGoog = Mid(parseGoog, 1, InStr(1, parseGoog, "'") - 1)
End Function

Function NumberPart(aString As String) As Long
Dim s As String, i As Integer, mc As String
For i = 1 To Len(aString)
mc = Mid(aString, i, 1)
If Asc(mc) >= 48 And Asc(mc) <= 57 Then s = s & mc
Next i
NumberPart = CLng(s)
End Function

Sub Test_Numberpart()
Dim s As String
s = "A5BC123"
Debug.Print s, NumberPart(s)
End Sub

theta
02-09-2012, 07:13 AM
Just thinking how I can expand this. What if there was more than one instance of a matched string (as this will make the function alot more useful).

I guess it will involve a loop, with the start position of the second loop being the position of the previous result +1 ?

Kenneth Hobs
02-09-2012, 09:00 PM
Sub Test()
Dim s As String, ss As String, a() As Variant

s = "<duration>" & vbCrLf & _
"<value>16</value>" & vbCrLf & _
"<text>1 min</text>" & vbCrLf & _
"</duration>" & vbCrLf & _
"<html_instructions>Make a <b>U-turn</b> at <b>Palm Dr</b><div style=""font-size:0.9em"">Destination will be on the right</div></html_instructions>" & vbCrLf & _
"<distance>" & vbCrLf & _
"<value>44</value>" & vbCrLf & _
"<text>144 ft/text>" & vbCrLf & _
"</distance>" & vbCrLf & _
"</step>" & vbCrLf & _
"<duration>" & vbCrLf & _
"<value>164061</value>" & vbCrLf & _
"<text>1 day 22 hours</text>" & vbCrLf & _
"</duration>" & vbCrLf & _
"<distance>" & vbCrLf & _
"<value>4553964</value>" & vbCrLf & _
"<text>2,830 mi</text>" & vbCrLf & _
"</distance>" & vbCrLf & _
"<start_location>"
'MsgBox strBetweenTags(s, "<duration>", "</duration>")

s = "<k>123456</k>xxx<k>abc</k>7890"
'ss = strBetweenTags(s, "<k>", "</k>")
'MsgBox ss
'MsgBox strStripLeftToTag(s, "</k>")

s = "<k>Kenneth</k>xxx<k>Hobson</k>7890"
On Error Resume Next
a() = aStrBetweenTags(s, "<k>", "</k>")
If Err.Number = 13 Then
On Error GoTo 0
MsgBox "No Array was returned from aStrBetweenTags."
Exit Sub
End If
MsgBox Join(a, vbLf)

End Sub

Function strBetweenTags(aString As String, tag1 As String, tag2 As String) As String
Dim s As String, p1 As Integer, p11 As Integer, p2 As Integer, p22 As Integer

p1 = InStr(aString, tag1)
If p1 = 0 Then
strBetweenTags = ""
Exit Function
End If
p11 = p1 + Len(tag1)

p2 = InStr(aString, tag2)
If p2 = 0 Then
strBetweenTags = ""
Exit Function
End If
p22 = p2 + Len(tag2) + 1

strBetweenTags = Mid(aString, p11, p22 - p11 - Len(tag2) - 1)
End Function

Function strStripLeftToTag(aString As String, tag As String) As String
If InStr(aString, tag) = 0 Then
strStripLeftToTag = ""
Exit Function
End If
strStripLeftToTag = Right(aString, Len(aString) - InStr(aString, tag) - Len(tag) + 1)
End Function

Function aStrBetweenTags(aString As String, tag1 As String, tag2 As String) As Variant
Dim s() As Variant, ss() As Variant, r As String, rr As String, i As Integer
On Error GoTo EndFunction
ReDim s(0)
r = " "
rr = " "
Do
r = strBetweenTags(aString, tag1, tag2)
ReDim Preserve s(0 To UBound(s) + 1)
s(UBound(s)) = r
aString = strStripLeftToTag(aString, tag2)
Loop Until r = "" Or aString = ""
ReDim ss(1 To UBound(s) - 1)
For i = 1 To (UBound(s) - 1)
ss(i) = s(i)
Next i
aStrBetweenTags = ss()
EndFunction:
End Function

theta
02-10-2012, 02:30 AM
Wow that's fantastic. How hard would it be to set up a final Function (like aStrBetweenTags) but instead of looping to get all matching pairs - you can define the set you want e.g. I want the second 'k' pair so the msgbox would return "Hobson"

?

Kenneth Hobs
02-10-2012, 05:26 AM
Since the returned array has a first element with index number of 0, the 2nd is 1. Therefore:
MsgBox a(1)

theta
02-10-2012, 06:50 AM
Going to have to digest this code....very impressive, thanks!

p45cal
02-10-2012, 08:10 AM
Need to get from <table> (inclusive) through the end of </table>[vba]The only thing is, your text sample doesn't have an instance of <table>, I suspect this is because it's being stripped by vbaexpress, could you confirm?

Having stolen some ideas from Kenneth; here's a different approach which may need major surgery to deal with the missing <table>:
Sub blah()
s = "<k>Kenneth</k>xxx<k>Hobson</k>7890"
a = StrBetwTags(s, "k")
MsgBox Join(a, vbLf)
MsgBox a(2)
End Sub
Function StrBetwTags(s, tag)
Dim a(),UBa,xx,yy
UBa = 0
xx = Split(s, "</" & tag & ">")
For Each thing In xx
yy = Split(thing, "<" & tag & ">")
If UBound(yy) = 1 Then
UBa = UBa + 1
ReDim Preserve a(1 To UBa)
a(UBa) = yy(1)
End If
Next thing
StrBetwTags = a
End Function
It returns an array of the text between tags.
Oh, and I've just seen the word "inclusive" in your question - not too difficult if it's what I think; could you clarify?

theta
02-10-2012, 08:26 AM
Thanks guys. Both great approached. I am going to test this one now. I have one more problem to solve that I am struggling with - linked here http://www.vbaexpress.com/forum/showthread.php?t=40863, then everything is complete :)

theta
02-10-2012, 08:29 AM
I was looking at this example - http://www.mrexcel.com/forum/showthread.php?t=3847

But this only checks one string. I want to check it against a named range called 'Headers' that contain 10 values (A1:A10) each of which could be used a header....