Consulting

Results 1 to 19 of 19

Thread: Reference specific text strings despite being repeated elsewhere?

  1. #1
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location

    Reference specific text strings despite being repeated elsewhere?

    This topic is a carryover from another post that had code supplied but wasn't quite accurate in the results it achieved. Worthy of another post though, in that this situation is unique?

    The attachment has a very good code supplied by Leith, which isolates "Header" strings and provides the numbers associated with them, all sorted. However this only works when the "Header" string is unique to the body of text on the page. In the attached example I am seeking to include the numbers under the titles "Class" and "Distance", however these 'words' are repeated throughout the sheet, so the code can't differentiate. The Headers are shown in yellow fill and the repeated text in red font (Only this way for demonstration purposes). When you press Button 1, the selections that should appear in range B4:B should include the numbers 4, 8, 11, 12, 17, 21, 22. You can't reference column or row designations as a filter as each web page, that this example represents, can vary in length and content.


    The only thing unique about these "Headers" is they have a number and name directly below them, as compared to other examples, that don't.
    Attached Files Attached Files

  2. #2
    VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,710
    Location
    Paul,

    Can you save and upload that file as an xls? I would like to see it.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  3. #3
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    File as requested Sam
    Attached Files Attached Files

  4. #4
    VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,710
    Location
    Uhm. . . That is an xlsx. I am stuck on a small, old, slow computer ATT and only have Office XP,AKA Excel 2002. Sorry.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  5. #5
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    Your kidding!!...but I shouldn't scoff, look at the marvels designed and built using an abacus?
    Attached Files Attached Files

  6. #6
    VBAX Sage SamT's Avatar
    Joined
    Oct 2006
    Location
    Near Columbia
    Posts
    7,710
    Location
    Nope, not kidding. My main computer is down and all I have is this old laptop with 256M of Ram.

    Thanks for the file(s.) I have sorta been following this. Am sure glad I am not the one having to code it. Lieth is doing a great job.
    I expect the student to do their homework and find all the errrors I leeve in.


    Please take the time to read the Forum FAQ

  7. #7
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    Yes, their is a simple solution to this, a kinda, "If Mohammed won't go to the mountain, the mountain must...." solution, and that's by deleting all bulk text containing the repeated words of the headers in question. That's the subject of another nearby post that Paul H almost solved too, but it's come astray in that the code doesn't work on web query pasted data into excel, only typed in data? Get it solved soon, no doubt!

  8. #8
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello Paul,

    I finally got the time to get back to you question. After looking at many different race pages, I think this version of the macro should work about 99% of the time. For the other cases, you should be able to add the needed headers and the correct format codes when needed to the code.

    The macro recognizes 3 different header formats. Format 1 is the numbers separated by a dash, i.e. 2-3-79. Format 2 is just a number and nothing else after it. Format 3 is a number followed by any other text.

    Each header is now partially matched to the contents of cell. The cell must start with the header to be valid. So, "Sky Predictor" will now match "Sky Predictor" or "Sky Predictor (W)". It will not match " Sky Predictor" or "1 Sky Predictor" because "Sky Predictor" is not at the beginning of the header.

    Sub GetSlections()
    
        ' Updated: May 09, 2016
    
        Dim Cell        As Range
        Dim Data        As Variant
        Dim Format      As Long
        Dim Group1      As Variant
        Dim Group2      As Variant
        Dim Group3      As Variant
        Dim Header      As Variant
        Dim n           As Long
        Dim Numbers     As Object
        Dim RegExp      As Object
        Dim Text        As String
        Dim vArray      As Variant
        Dim Wks         As Worksheet
        Dim x           As Long
        
        
            Set Wks = ActiveSheet
            
            Group1 = Array(Array("Selections"), Array(1))
            Group2 = Array(Array("Sky Predictor", "Early Speed"), Array(2))
            Group3 = Array(Array("SkyForm Rating", "Best Form (12mths)", "Recent Form", "Distance", "Class", "Time Rating", "Start Type", "Best Overall"), Array(3))
                            
            Set Numbers = CreateObject("Scripting.Dictionary")
            Set RegExp = CreateObject("VBScript.RegExp")
            
                For Each Header In Array(Group1, Group2, Group3)
                    Format = Header(1)(0)
                    
                    For x = 0 To UBound(Header(0))
                        Set Cell = Wks.Cells.Find(Header(0)(x) & "*", , xlValues, xlPart, xlByRows, xlNext, False, False, False)
                        If Not Cell Is Nothing Then
                            Do
                                n = n + 1
                                Text = Cell.Offset(n, 0).Value
                                
                                Select Case Format
                                    Case 1
                                        RegExp.Global = True
                                        RegExp.Pattern = "\D(\d+)\s\-.*"
                                        
                                        Do
                                            If Text = "" Then GoTo NextHeader
                                            Set Matches = RegExp.Execute(Text & "-")
                                            If Matches.Count = 0 Then Exit Do
                                            Data = Val(Matches(0).SubMatches(0))
                                            Text = Right(Text, Len(Text) - (Matches(0).FirstIndex + Len(Data) + 1))
                                            If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
                                        Loop
                                    
                                    Case 2
                                        RegExp.Global = False
                                        RegExp.Pattern = "\d+"
                                    
                                        If RegExp.Test(Text) = False Then GoTo NextHeader
                                        Data = Val(Text)
                                        If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
                                        
                                    Case 3
                                        RegExp.Global = False
                                        RegExp.Pattern = "(\d+)\s.*"
                                        
                                        Set Matches = RegExp.Execute(Text)
                                        If Matches.Count = 0 Then GoTo NextHeader
                                        Text = Matches(0).SubMatches(0)
                                        Data = Val(Text)
                                        If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
                                End Select
                            Loop
                            
    NextHeader:
                            n = 0
                        End If
                    Next x
                Next Header
                
        vArray = SortList(Numbers.Keys)
        MsgBox Join(vArray, ", ")
    
    End Sub
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  9. #9
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    Hi Leith, thanks mate, I'll have to have a look later on as busy with something else right now. There was only one other issue that I haven't raised yet, and that was in the Selections header. Sometimes there is none, sometimes one array separated by dashes, and sometimes up to two more arrays of four numbers horizontally, immediately below. For some reason the last number in these extra selections was dropped off in your code? Like you say 99% there, but I think this one is a quick fix for someone of your splendiferous ability?
    Cheers, I'll get back to you shortly.

  10. #10
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello Paul,

    I thought only the Yanks used "splendiferous". It would be easier to correct the code with a couple of examples of problem "Selections" headers, since I haven't come across the problem in the web pages I have reviewed. Thanks.
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  11. #11
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    No change I'm sorry Leith. However I cured the problem of Distance and Class header selections not appearing with another macro Paul H did for me. I basically deleted all rows containing those words in between two points, just leaving the headers. No code conflict now.
    I was slightly wrong with my other critique about the code omitting the last number in extra "Selections" arrays. See one at A29 on the attached sheet. It misses this one completely, so could probably use another header called "Form Experts", to pick this array up. I should be able to manage that inclusion?

    To summarise, in unison with Paul H's code and yours, plus an amendment as just discussed, I should be altogether sorted.

    On the attached file, just move those yellow arrays around, to see if they are selected by your code.

    Thanks again,
    Paul
    Attached Files Attached Files

  12. #12
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello Paul,

    Okay, this should be golden now. Originally in the data you posted the headers appeared unique and once a header was found the search looked for the next header. Since the header "Distance" can appear in two distinct formats, the macro now searches for multiple occurrences of a header before looking for the next one in the group. The macro is already installed ini the attached workbook. Let me know if you find anything else.

    Sub GetSelections()
    
        Dim Cell        As Range
        Dim Data        As Variant
        Dim FirstCell   As Range
        Dim Format      As Long
        Dim Group1      As Variant
        Dim Group2      As Variant
        Dim Group3      As Variant
        Dim Header      As Variant
        Dim n           As Long
        Dim Numbers     As Object
        Dim RegExp      As Object
        Dim Text        As String
        Dim vArray      As Variant
        Dim Wks         As Worksheet
        Dim x           As Long
        
        
            Set Wks = ActiveSheet
            
            Group1 = Array(Array("Selections", "Form Experts"), Array(1))
            Group2 = Array(Array("Sky Predictor", "Early Speed", "Distance"), Array(2))
            Group3 = Array(Array("SkyForm Rating", "Best Form (12mths)", "Recent Form", "Distance", "Class", "Time Rating", "Start Type", "Best Overall"), Array(3))
                            
            Set Numbers = CreateObject("Scripting.Dictionary")
            Set RegExp = CreateObject("VBScript.RegExp")
            
                For Each Group In Array(Group1, Group2, Group3)
                    Format = Group(1)(0)
                    
                    For x = 0 To UBound(Group(0))
                        Header = Group(0)(x)
                        
                        Set Cell = Wks.Cells.Find(Header & "*", , xlValues, xlPart, xlByRows, xlNext, False, False, False)
                        
                        If Not Cell Is Nothing Then
                            Set FirstCell = Cell
                            
    ParseData:
                            Do
                                n = n + 1
                                Text = Cell.Offset(n, 0).Value
                                
                                Select Case Format
                                    Case 1
                                        RegExp.Global = True
                                        RegExp.Pattern = "\D(\d+)\s\-.*"
                                        
                                        Do
                                            If Text = "" Then GoTo NextHeader
                                            Set Matches = RegExp.Execute(Text & "-")
                                            If Matches.Count = 0 Then Exit Do
                                            Data = Val(Matches(0).SubMatches(0))
                                            Text = Right(Text, Len(Text) - (Matches(0).FirstIndex + Len(Data) + 1))
                                            If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
                                        Loop
                                    
                                    Case 2
                                        RegExp.Global = False
                                        RegExp.Pattern = "\d+"
                                    
                                        If RegExp.Test(Text) = False Then GoTo NextHeader
                                        Data = Val(Text)
                                        If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
                                        
                                    Case 3
                                        RegExp.Global = False
                                        RegExp.Pattern = "(\d+)\s.*"
                                        
                                        Set Matches = RegExp.Execute(Text)
                                        If Matches.Count = 0 Then GoTo NextHeader
                                        Text = Matches(0).SubMatches(0)
                                        Data = Val(Text)
                                        If Not Numbers.Exists(Data) Then Numbers.Add Data, ""
                                End Select
                            Loop
                            
    NextHeader:
                            n = 0
                          ' Are there any more cells with this header?
                            Set Cell = Wks.Cells.Find(Header & "*", Cell)
                            If Cell.Address <> FirstCell.Address Then GoTo ParseData
                        End If
                    Next x
    NextGroup:
                Next Group
                
        vArray = SortList(Numbers.Keys)
        
        If Range("B4") <> "" Then Range("B4").End(xlDown).ClearContents
        Range("B4").Resize(UBound(vArray, 1) + 1, 1).Value = Application.Transpose(vArray)
        
        'MsgBox Join(vArray, ", ")
    
    End Sub
    Attached Files Attached Files
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  13. #13
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    Sorry to tell you this Leith, but if anything the code got worse! If you move the array 1, 2, 3, 4 around in the bottom tables, it only works under "Distance". Put the array under any of the other headers and you get zip.

  14. #14
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello Paul,

    It worked for me moving them headers and the data around elsewhere on the page. So what is different here?
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  15. #15
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    Dunno mate? In the attachment, I've placed the 1, 2, 3, 4 array under every header except "Distance", pressed the macro button and only get the arrays in "Selections".
    Attached Files Attached Files

  16. #16
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello Paul,

    Are you changing the formats under the headers or is it the web page? Macros are not clairvoyant. There has to be a predictable pattern for data to be parsed. Exceptions can be added when needed but this bordering on the ridiculous.
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  17. #17
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    I'm as surprised as you? Like, I've just removed the arrays and typed them in (General format and NO colour), as opposed to pasted....no change, except if I put the array under "Distance", then it works. Are you saying the xlsm sheet I sent you works for all headers? Must be because we drive on the other side of the road in Oz or something, dunno'? One of life's mysteries!

  18. #18
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello Paul,

    The data in the workbook you posted does not parse correctly because the data formatting is incorrect for the headers.

    The web pages that I have seen are consistent in how the data is presented under each header. The macro is set to parse the data below the headers based on the format used for that header's data.

    For example, "Selections" can have no text, text only, or text with numbers separated by hyphens like "Chris Barsby 2 - 5 - 9 - 3". The only format that will be parsed is the latter. All other text formats under "Selections" are ignored.

    The format for the headers "Skyform Rating", "Best Form (12mths)", "Recent Form", and "Distance" is one or more digits followed by text e.g., "1 COOCHIN CAROL". If the cell is not formatted in this way then the number will not be parsed.

    By placing only numbers under these headers, the macro will skip adding them to the list. The only headers the macro expects to have numbers under them is "Sky Predictor" and "Early Speed". I made an exception for "Distance" based on your request. However, I have not seen "Distance" on any page that had only numbers below it.


    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  19. #19
    VBAX Tutor
    Joined
    Jul 2015
    Posts
    212
    Location
    Of course!....Why didn't I think of that? lol....not as smart as some, perhaps? YEP, that fixed it champion. Mission accomplished, and what a mission it's been!Hats off to you for your persistence Leith, many would've fallen by the wayside by now.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •