PDA

View Full Version : parsing in vbscript



antonyjones1
10-19-2009, 11:53 AM
Excuse my ignoarance but I have been told I need to parse the following link in vbscript to display the information in excel:

http://www.peauction.com/pewiki.php

Anyone got any ideas how i'd begin to do that? :)

Bob Phillips
10-19-2009, 01:07 PM
What is this, one long string? Do you just want to put each section upto the | on a new line?

antonyjones1
10-19-2009, 02:15 PM
Yeah i'd want a new row to start after | also things need to be seperated into different columns. so for example the following:

1_TT+1.79 PED_Standard Holo Module blueprint|2_TT+1.96 PED_EMT kit Ek-2500|3_TT+18.88 PED_Standard Plate blueprint|4_TT+3.09 PED_Standard Lever blueprint|5_100.66% of TT value_Animal hide|6_104.09% of TT value_Thin wool|

Would be seperated as per the attached spreadsheet.

Bob Phillips
10-19-2009, 03:20 PM
So two questions.

How do you get this info in excel, is it a text file?

How do you know where to split within a line?

antonyjones1
10-19-2009, 04:08 PM
So two questions.

How do you get this info in excel, is it a text file?

How do you know where to split within a line?

I get the info from the link I posted above

http://www.peauction.com/pewiki.php

I'm not sure how the data is collected but it's from a online game I play and it's the information on what the current value of items in the game are selling at. I have figured out where the breaks are by just reading the information that has been pulled out.

27_116.86% of TT value_Cumbriz ingot|

I believe the above means that the 27th item sold at 116.86% of its tt value (trade value). The Item was Cumbriz ingot. The information I need to gather from this is the item cumbriz ingot and it's markup price of 116.86%.

If it helps to make more sense you can go to http://www.peauction.com/search.php search for some of the items to show what that site returns. I want to be able to get the same info but into a excel spreadhseet so I can easily cross reference items.

GTO
10-19-2009, 05:52 PM
What version of Excel do you use? I ask, as it appears to exceed column.count.

Mark

antonyjones1
10-19-2009, 10:35 PM
What version of Excel do you use? I ask, as it appears to exceed column.count.

Mark


I'm using an old version of excel i'm afraid. Excel 97'. I can always install a newer version though :)

antonyjones1
10-20-2009, 11:51 AM
I have a version of 2000 I could install or a trial version of 2007 if you feel those would be better suited to what I need to do?

macropod
10-20-2009, 04:14 PM
Excuse my ignoarance but I have been told I need to parse the following link in vbscript to display the information in excel:

http://www.peauction.com/pewiki.php

Anyone got any ideas how i'd begin to do that? :)
Hi antonyjones1,

The link in your post points to a dataset with >5600 records and, from what I can make of it, parsing it Word will be easier than in Excel:
Sub ParseData()
Application.ScreenUpdating = False
With ActiveDocument
With .Content.Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = False
.Text = "|"
.Replacement.Text = "^p"
.Execute Replace:=wdReplaceAll
.Text = " of TT"
.Replacement.Text = ""
.Execute Replace:=wdReplaceAll
.Text = "TT"
.Execute Replace:=wdReplaceAll
.Text = "value_"
.Execute Replace:=wdReplaceAll
.Text = "PED_"
.Execute Replace:=wdReplaceAll
.Text = "Unknown_"
.Execute Replace:=wdReplaceAll
.MatchWildcards = True
.Text = "([0-9]{1,4})_"
.Replacement.Text = "\1^t"
.Execute Replace:=wdReplaceAll
.Text = "+([0-9.]{1,}) ([^32-^255]{1,})(^13)"
.Replacement.Text = "\2^tplus \1\3"
.Execute Replace:=wdReplaceAll
.Text = "([0-9.]{1,}) ([^32-^255]{1,})(^13)"
.Replacement.Text = "\2^t\1\3"
.Execute Replace:=wdReplaceAll
.Text = "([0-9.]{3,}%) ([^32-^255]{1,})(^13)"
.Replacement.Text = "\2^t\1\3"
.Execute Replace:=wdReplaceAll
End With
With .Range
.InsertBefore "Item" & vbTab & "Item Description" & vbTab & "Markup" & vbCr
.ConvertToTable Separator:=wdSeparateByTabs, AutoFit:=True, AutoFitBehavior:=wdAutoFitContent
End With
End With
Application.ScreenUpdating = True
End SubWith the above, you'll end up with a 3-column Word table showing the Item #, Item Description and Markup. If you don't want the 'Item #' column, delete the following two code segments:

.Replacement.Text = "\1^t"
and

"Item Description" & vbTab & If you still want the data in Excel, it's a simple matter to copy & paste the table from Word.

Notes: Some items had no markup value - the markup appears as 'Unknown' in the data and some numerical values lack the '+' to indicate that the price is 'TT plus x', so my code doesn't insert the 'plus' for those entries. If you still want the 'plus', delete the first instance of:

.Replacement.Text = "\2^t\1\3"

GTO
10-20-2009, 11:57 PM
Greetings Antony,

Firstly, my total bad for the info I stated at #6. I suffered a true goober moment, as when I looked at the web page, I am quite certain I looked at View|Source and must have thunked my head into something hard at the same time... as I arrived at the conclusion that there were way too many columns:omg2:.

Of course this isn't true footinmout , as if you save the web page as a textfile, it seems to me to be a really long text string with the "|" seperating records, but line feeds inserted wherever.

So the problem seemed to me that the records wrap between lines, so text-to-columns seemed out.

Here's a try in Excel; I certainly hope better than my less-than-stellar initial input...

Create a temp folder.
Save the web page as a text file to this folder; name the textfile "pewiki.txt".
Create a new/blank workbook in the same folder.In a Standard Module:

Option Explicit

Sub CoerceText()
Dim _
REX As Object, _
oMatches As Object, _
FSO As Object, _
fsoTxtFile As Object, _
aryInput As Variant, _
aryOutput As Variant, _
i As Long

Set REX = CreateObject("VBScript.RegExp")

Set FSO = CreateObject("Scripting.FileSystemObject")
Set fsoTxtFile = FSO.OpenTextFile(ThisWorkbook.Path & "\pewiki.txt", 1, False, -2)
aryInput = fsoTxtFile.ReadAll
fsoTxtFile.Close

With REX
.Global = True
.MultiLine = False
.Pattern = "[\f\n\r\v]+"
aryInput = Split(.Replace(aryInput, ""), "|", , vbTextCompare)

.Pattern = "([\+]?[0-9]+\.[0-9]+%?)(.*_)(.*)"
'submatch 1: Maybe a plus sign, followed by (fb) one or more digits, _
fb decimal point, fb one or more digits, fb maybe a percent sign.
'submatch 2: Anything else until it hits an underscore (underscore is last char).
'submatch 3: Whatever is left.

ReDim aryOutput(LBound(aryInput) To UBound(aryInput), 1 To 2)

For i = LBound(aryInput) To UBound(aryInput)
If .Test(aryInput(i)) Then
Set oMatches = .Execute(aryInput(i))
aryOutput(i, 1) = oMatches(0).SubMatches(2)
aryOutput(i, 2) = oMatches(0).SubMatches(0)
Else
aryOutput(i, 1) = "No Match:"
aryOutput(i, 2) = aryInput(i)
End If
Next

With Range("A1").Resize(UBound(aryOutput, 1) - LBound(aryOutput, 1) + 1, 2)
.NumberFormat = "@"
.Value = aryOutput
.EntireColumn.AutoFit
End With
End With
End Sub


Hopefully someone who uses Regular Expressions will stop in and offer a hardier pattern, but this seemed to work on the data presented.

The records that don't match the criteria (how you wanted it parsed) are just returned as a whole record in Col 2, with "No match" in col 1.

You may have better results with macropod's (that is just "Cool" code!), but thought I'd give 'er a try.

:hide: Again, sorry for the bad info,

Mark

Krishna Kumar
10-21-2009, 02:43 AM
Hi,

Here is my version.

Save the web page as a text file; name the textfile "pewiki.txt".
Create a new/blank workbook in the same folder.
Sub kTest()
Dim txt As String, x, y, i As Long, n As Long, k(), fso As Object

Set fso = CreateObject("scripting.filesystemobject")
txt = fso.opentextfile(ThisWorkbook.Path & "\pewiki.txt").readall

x = Split(txt, "|")
ReDim k(1 To UBound(x) + 1, 1 To 2)
For i = 0 To UBound(x)
y = Split(x(i), "_")
k(i + 1, 1) = Trim(Application.Clean(Replace(y(2), Chr(10), Chr(32))))
k(i + 1, 2) = NUM(y(1))
Next
With [a1]
.Offset(, 1).Resize(UBound(k, 1) + 1).NumberFormat = "@"
.Resize(UBound(k, 1), 2).Value = k
End With
End Sub
Function NUM(v) As Variant
Dim i As Long, s
For i = 1 To Len(v)
s = Mid$(v, i, 1)
Select Case Asc(s)
Case 37, 43, 46, 48 To 57: NUM = NUM & s
End Select
Next
End Function

antonyjones1
10-21-2009, 12:04 PM
you guys are awesome! Both ways work a treat. Is there a way of it reading from the site and automatically updated from that site every few minutes? or even having a "refresh" option of some sort I can select to update the information.

Thanks again for your help!

GTO
10-21-2009, 04:08 PM
Hi Antony,

Okay, I am in unfamiliar territory, so in a new/blank/throwaway workbook, try:

Rename Sheet1 "MyData".
Add a ActiveX command button to MyData.

In the Worksheet Module for MyData:

Option Explicit

Private Sub CommandButton1_Click()
Call CoerceText_3("MyData")
End Sub


In a Standard Module:

Option Explicit

Sub CoerceText_3(ShName As String)
Dim _
REX As Object, _
oMatches As Object, _
aryInput As Variant, _
aryOutput As Variant, _
i As Long

Set REX = CreateObject("VBScript.RegExp")

aryInput = Split(GrabSimplePage("http://www.peauction.com/pewiki.php"), "|", , vbTextCompare)

With REX
.Global = True
.MultiLine = False
.Pattern = "([\+]?[0-9]+\.[0-9]+%?)(.*_)(.*)"
'submatch 1: Maybe a plus sign, followed by (fb) one or more digits, _
fb decimal point, fb one or more digits, fb maybe a percent sign.
'submatch 2: Anything else until it hits an underscore (underscore is last char).
'submatch 3: Whatever is left.

ReDim aryOutput(LBound(aryInput) To UBound(aryInput), 1 To 2)
For i = LBound(aryInput) To UBound(aryInput)
If .Test(aryInput(i)) Then
Set oMatches = .Execute(aryInput(i))
aryOutput(i, 1) = oMatches(0).SubMatches(2)
aryOutput(i, 2) = oMatches(0).SubMatches(0)
Else
aryOutput(i, 1) = "No Match:"
aryOutput(i, 2) = aryInput(i)
End If
Next

With ThisWorkbook.Worksheets(ShName)
.Cells.ClearContents
With .Range("A2:B2")
.Font.Bold = True
.Value = Array("Item", "Markup")
End With
With .Range("A3").Resize(UBound(aryOutput, 1) - LBound(aryOutput, 1) + 1, 2)
.NumberFormat = "@"
.Value = aryOutput
.EntireColumn.AutoFit
End With
End With
End With
End Sub

'// Note Early-Bound
'// REQUIRED REFERENCE: Add a reference to "Microsoft HTML Object Library"
Function GrabSimplePage(ByVal strURL As String) As String
Dim _
htmDoc1 As HTMLDocument, _
htmDoc2 As HTMLDocument

'// This fails (429: cannot create object), but I don't know why?
'Set htmDoc1 = CreateObject("MSHTML.HTMLDocument")

'// As early-bound...
Set htmDoc1 = New HTMLDocument
Set htmDoc2 = htmDoc1.createDocumentFromUrl(strURL, "")
Do Until htmDoc2.readyState = "complete"
DoEvents
Loop

GrabSimplePage = htmDoc2.documentElement.outerText '.outerHTML
Set htmDoc1 = Nothing: Set htmDoc2 = Nothing
End Function


Seems to work fine even on my "poor ol' laptop" and not even too terribly slow. As you'll note, in this one we skipped the text file alltogether and use a function to grab the text instead.

As noted/commented in the code, this requires a reference to the "Microsoft HTML Object Library".

Hope that helps,

Mark

antonyjones1
10-22-2009, 11:48 AM
Hi GTO,

I'm not sure how this works for you. I seem to get the message

Compile error

User-defined type not defined

GTO
10-22-2009, 04:28 PM
...As noted/commented in the code, this requires a reference to the "Microsoft HTML Object Library"...


Did you add a reference to the library?

antonyjones1
10-23-2009, 01:33 PM
I just copied and pasted the code as you had listed it. i'll try and do a reference to the library now.

antonyjones1
10-23-2009, 02:41 PM
lol what an idiot I am. i want to references and ticked the html libarary check box and now works like a charm. Thanks a lot :)

GTO
10-23-2009, 05:29 PM
:beerchug: Glad that worked Antony :)

At All:

Anyone happen to know how to get to this library in a late-bound manner? I looked for quite a bit, but all examples I found were early-bound...:igiveup:

Mark

antonyjones1
10-24-2009, 04:50 AM
I've incorporated all of the above in my workbook and it works well. I have added a new tab where I am storing the values of the items I am interested in (I don't need every single item that the site displays).

I have set up a second button that runs a macro: this inserts a vlookup formula in column C on the mydata tab cross references the new tab against the mydata tab. It then populates column C with the values. Is there a way of refining the code so that it pulls out just the items I have listed in this new tab instead of every single one?

I also would like a final button that will automatically search tab "Level I" in the attached spreadsheet and pull certain information into another tab. It will need to search the range A3:F17 for the name of the item (in cell A3), the profit/loss (in cell 17), and then will put what is in cell 17 in column D of mydata against the corresponding item that is in column A of mydata.