PDA

View Full Version : Reference specific text strings despite being repeated elsewhere?



RINCONPAUL
05-08-2016, 12:11 PM
This topic is a carryover from another post that had code supplied but wasn't quite accurate in the results it achieved. Worthy of another post though, in that this situation is unique?

The attachment has a very good code supplied by Leith, which isolates "Header" strings and provides the numbers associated with them, all sorted. However this only works when the "Header" string is unique to the body of text on the page. In the attached example I am seeking to include the numbers under the titles "Class" and "Distance", however these 'words' are repeated throughout the sheet, so the code can't differentiate. The Headers are shown in yellow fill and the repeated text in red font (Only this way for demonstration purposes). When you press Button 1, the selections that should appear in range B4:B should include the numbers 4, 8, 11, 12, 17, 21, 22. You can't reference column or row designations as a filter as each web page, that this example represents, can vary in length and content.


The only thing unique about these "Headers" is they have a number and name directly below them, as compared to other examples, that don't.

SamT
05-08-2016, 08:49 PM
Paul,

Can you save and upload that file as an xls? I would like to see it.

RINCONPAUL
05-08-2016, 10:07 PM
File as requested Sam

SamT
05-09-2016, 05:25 AM
Uhm. . . That is an xlsx. I am stuck on a small, old, slow computer ATT and only have Office XP,AKA Excel 2002. Sorry.

RINCONPAUL
05-09-2016, 12:07 PM
Your kidding!!...but I shouldn't scoff, look at the marvels designed and built using an abacus?:ipray:

SamT
05-09-2016, 02:35 PM
Nope, not kidding. My main computer is down and all I have is this old laptop with 256M of Ram.

Thanks for the file(s.) I have sorta been following this. Am sure glad I am not the one having to code it. Lieth is doing a great job.

RINCONPAUL
05-09-2016, 02:52 PM
Yes, their is a simple solution to this, a kinda, "If Mohammed won't go to the mountain, the mountain must...." solution, and that's by deleting all bulk text containing the repeated words of the headers in question. That's the subject of another nearby post that Paul H almost solved too, but it's come astray in that the code doesn't work on web query pasted data into excel, only typed in data? Get it solved soon, no doubt!

Leith Ross
05-09-2016, 05:55 PM
Hello Paul,

I finally got the time to get back to you question. After looking at many different race pages, I think this version of the macro should work about 99% of the time. For the other cases, you should be able to add the needed headers and the correct format codes when needed to the code.

The macro recognizes 3 different header formats. Format 1 is the numbers separated by a dash, i.e. 2-3-79. Format 2 is just a number and nothing else after it. Format 3 is a number followed by any other text.

Each header is now partially matched to the contents of cell. The cell must start with the header to be valid. So, "Sky Predictor" will now match "Sky Predictor" or "Sky Predictor (W)". It will not match " Sky Predictor" or "1 Sky Predictor" because "Sky Predictor" is not at the beginning of the header.



Sub GetSlections()

' Updated: May 09, 2016

Dim Cell As Range
Dim Data As Variant
Dim Format As Long
Dim Group1 As Variant
Dim Group2 As Variant
Dim Group3 As Variant
Dim Header As Variant
Dim n As Long
Dim Numbers As Object
Dim RegExp As Object
Dim Text As String
Dim vArray As Variant
Dim Wks As Worksheet
Dim x As Long


Set Wks = ActiveSheet

Group1 = Array(Array("Selections"), Array(1))
Group2 = Array(Array("Sky Predictor", "Early Speed"), Array(2))
Group3 = Array(Array("SkyForm Rating", "Best Form (12mths)", "Recent Form", "Distance", "Class", "Time Rating", "Start Type", "Best Overall"), Array(3))

Set Numbers = CreateObject("Scripting.Dictionary")
Set RegExp = CreateObject("VBScript.RegExp")

For Each Header In Array(Group1, Group2, Group3)
Format = Header(1)(0)

For x = 0 To UBound(Header(0))
Set Cell = Wks.Cells.Find(Header(0)(x) & "*", , xlValues, xlPart, xlByRows, xlNext, False, False, False)
If Not Cell Is Nothing Then
Do
n = n + 1
Text = Cell.Offset(n, 0).Value

Select Case Format
Case 1
RegExp.Global = True
RegExp.Pattern = "\D(\d+)\s\-.*"

Do
If Text = "" Then GoTo NextHeader
Set Matches = RegExp.Execute(Text & "-")
If Matches.Count = 0 Then Exit Do
Data = Val(Matches(0).SubMatches(0))
Text = Right(Text, Len(Text) - (Matches(0).FirstIndex + Len(Data) + 1))
If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
Loop

Case 2
RegExp.Global = False
RegExp.Pattern = "\d+"

If RegExp.Test(Text) = False Then GoTo NextHeader
Data = Val(Text)
If Not Numbers.Exists(Data) Then Numbers.Add Data, ""

Case 3
RegExp.Global = False
RegExp.Pattern = "(\d+)\s.*"

Set Matches = RegExp.Execute(Text)
If Matches.Count = 0 Then GoTo NextHeader
Text = Matches(0).SubMatches(0)
Data = Val(Text)
If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
End Select
Loop

NextHeader:
n = 0
End If
Next x
Next Header

vArray = SortList(Numbers.Keys)
MsgBox Join(vArray, ", ")

End Sub

RINCONPAUL
05-09-2016, 06:11 PM
Hi Leith, thanks mate, I'll have to have a look later on as busy with something else right now. There was only one other issue that I haven't raised yet, and that was in the Selections header. Sometimes there is none, sometimes one array separated by dashes, and sometimes up to two more arrays of four numbers horizontally, immediately below. For some reason the last number in these extra selections was dropped off in your code? Like you say 99% there, but I think this one is a quick fix for someone of your splendiferous ability? :jsmile:
Cheers, I'll get back to you shortly.

Leith Ross
05-09-2016, 06:34 PM
Hello Paul,

I thought only the Yanks used "splendiferous". It would be easier to correct the code with a couple of examples of problem "Selections" headers, since I haven't come across the problem in the web pages I have reviewed. Thanks.

RINCONPAUL
05-10-2016, 12:33 AM
No change I'm sorry Leith. However I cured the problem of Distance and Class header selections not appearing with another macro Paul H did for me. I basically deleted all rows containing those words in between two points, just leaving the headers. No code conflict now.
I was slightly wrong with my other critique about the code omitting the last number in extra "Selections" arrays. See one at A29 on the attached sheet. It misses this one completely, so could probably use another header called "Form Experts", to pick this array up. I should be able to manage that inclusion?

To summarise, in unison with Paul H's code and yours, plus an amendment as just discussed, I should be altogether sorted.

On the attached file, just move those yellow arrays around, to see if they are selected by your code.

Thanks again,
Paul

Leith Ross
05-10-2016, 11:50 AM
Hello Paul,

Okay, this should be golden now. Originally in the data you posted the headers appeared unique and once a header was found the search looked for the next header. Since the header "Distance" can appear in two distinct formats, the macro now searches for multiple occurrences of a header before looking for the next one in the group. The macro is already installed ini the attached workbook. Let me know if you find anything else.



Sub GetSelections()

Dim Cell As Range
Dim Data As Variant
Dim FirstCell As Range
Dim Format As Long
Dim Group1 As Variant
Dim Group2 As Variant
Dim Group3 As Variant
Dim Header As Variant
Dim n As Long
Dim Numbers As Object
Dim RegExp As Object
Dim Text As String
Dim vArray As Variant
Dim Wks As Worksheet
Dim x As Long


Set Wks = ActiveSheet

Group1 = Array(Array("Selections", "Form Experts"), Array(1))
Group2 = Array(Array("Sky Predictor", "Early Speed", "Distance"), Array(2))
Group3 = Array(Array("SkyForm Rating", "Best Form (12mths)", "Recent Form", "Distance", "Class", "Time Rating", "Start Type", "Best Overall"), Array(3))

Set Numbers = CreateObject("Scripting.Dictionary")
Set RegExp = CreateObject("VBScript.RegExp")

For Each Group In Array(Group1, Group2, Group3)
Format = Group(1)(0)

For x = 0 To UBound(Group(0))
Header = Group(0)(x)

Set Cell = Wks.Cells.Find(Header & "*", , xlValues, xlPart, xlByRows, xlNext, False, False, False)

If Not Cell Is Nothing Then
Set FirstCell = Cell

ParseData:
Do
n = n + 1
Text = Cell.Offset(n, 0).Value

Select Case Format
Case 1
RegExp.Global = True
RegExp.Pattern = "\D(\d+)\s\-.*"

Do
If Text = "" Then GoTo NextHeader
Set Matches = RegExp.Execute(Text & "-")
If Matches.Count = 0 Then Exit Do
Data = Val(Matches(0).SubMatches(0))
Text = Right(Text, Len(Text) - (Matches(0).FirstIndex + Len(Data) + 1))
If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
Loop

Case 2
RegExp.Global = False
RegExp.Pattern = "\d+"

If RegExp.Test(Text) = False Then GoTo NextHeader
Data = Val(Text)
If Not Numbers.Exists(Data) Then Numbers.Add Data, ""

Case 3
RegExp.Global = False
RegExp.Pattern = "(\d+)\s.*"

Set Matches = RegExp.Execute(Text)
If Matches.Count = 0 Then GoTo NextHeader
Text = Matches(0).SubMatches(0)
Data = Val(Text)
If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
End Select
Loop

NextHeader:
n = 0
' Are there any more cells with this header?
Set Cell = Wks.Cells.Find(Header & "*", Cell)
If Cell.Address <> FirstCell.Address Then GoTo ParseData
End If
Next x
NextGroup:
Next Group

vArray = SortList(Numbers.Keys)

If Range("B4") <> "" Then Range("B4").End(xlDown).ClearContents
Range("B4").Resize(UBound(vArray, 1) + 1, 1).Value = Application.Transpose(vArray)

'MsgBox Join(vArray, ", ")

End Sub

RINCONPAUL
05-10-2016, 12:06 PM
Sorry to tell you this Leith, but if anything the code got worse! If you move the array 1, 2, 3, 4 around in the bottom tables, it only works under "Distance". Put the array under any of the other headers and you get zip.

Leith Ross
05-10-2016, 12:20 PM
Hello Paul,

It worked for me moving them headers and the data around elsewhere on the page. So what is different here?

RINCONPAUL
05-10-2016, 12:32 PM
Dunno mate? In the attachment, I've placed the 1, 2, 3, 4 array under every header except "Distance", pressed the macro button and only get the arrays in "Selections".

Leith Ross
05-10-2016, 12:51 PM
Hello Paul,

Are you changing the formats under the headers or is it the web page? Macros are not clairvoyant. There has to be a predictable pattern for data to be parsed. Exceptions can be added when needed but this bordering on the ridiculous.

RINCONPAUL
05-10-2016, 01:12 PM
I'm as surprised as you? Like, I've just removed the arrays and typed them in (General format and NO colour), as opposed to pasted....no change, except if I put the array under "Distance", then it works. Are you saying the xlsm sheet I sent you works for all headers? Must be because we drive on the other side of the road in Oz or something, dunno'? One of life's mysteries!:stars:

Leith Ross
05-10-2016, 01:37 PM
Hello Paul,

The data in the workbook you posted does not parse correctly because the data formatting is incorrect for the headers.

The web pages that I have seen are consistent in how the data is presented under each header. The macro is set to parse the data below the headers based on the format used for that header's data.

For example, "Selections" can have no text, text only, or text with numbers separated by hyphens like "Chris Barsby 2 - 5 - 9 - 3". The only format that will be parsed is the latter. All other text formats under "Selections" are ignored.

The format for the headers "Skyform Rating", "Best Form (12mths)", "Recent Form", and "Distance" is one or more digits followed by text e.g., "1 COOCHIN CAROL". If the cell is not formatted in this way then the number will not be parsed.

By placing only numbers under these headers, the macro will skip adding them to the list. The only headers the macro expects to have numbers under them is "Sky Predictor" and "Early Speed". I made an exception for "Distance" based on your request. However, I have not seen "Distance" on any page that had only numbers below it.

RINCONPAUL
05-10-2016, 01:51 PM
Of course!....Why didn't I think of that? lol....not as smart as some, perhaps? YEP, that fixed it champion. Mission accomplished, and what a mission it's been!Hats off to you for your persistence Leith, many would've fallen by the wayside by now.:bow: