Consulting

Results 1 to 6 of 6

Thread: Extract Items from String and then Sort

  1. #1

    Extract Items from String and then Sort

    I have a string that contains items that I need to extract and sort. Items are those that are preceded by "with", "and", "comprising", and "," followed by their values (enclosed in parenthesis). After items are extracted, I want to re-arrange them from the original format "item description (value)" to the new format "(value) item description". After this, I want to sort the items based on their values, and place all the items inside the <ITEM> tag separated by "#"

    For example:
    I have a pet project comprising aluminum wire (100), paper (5), and sticks (2).

    Desired output:
    (2) sticks#(5) paper#(100) aluminum wire

    I have attached a representative sample to best illustrate this.

    This is a bit complicated so I'm asking anyone for ideas on how best to approach this.
    Attached Files Attached Files

  2. #2
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello swaggerbox,

    This worked for the example you provided. I cannot guarantee that it will work 100% with your original data. Parsing complex strings is challenging due to the large number of arrangements of syntax and punctuation. Give it try and let me know your results.

    The macro below has already been added to the attached workbook. Double Click anywhere in the Text Box and a message box will display the arsed results.

    Module1 - Macro Code
    
    Option Explicit
    
    
    Function ParseItems(ByVal Text As String)
    
    
        Dim vArray  As Variant
        Dim Items   As Variant
        Dim j       As Long
        Dim k       As Long
        Dim Lookup  As Variant
        Dim LoToHi  As Boolean
        Dim n       As Long
        Dim RegExp  As Object
        Dim sorted  As Boolean
        Dim upBound As Long
        
            Set RegExp = CreateObject("VBScript.RegExp")
                RegExp.Global = True
                RegExp.IgnoreCase = False
                RegExp.Pattern = ".*(?:,|with|and|comprising)(.+)\((\d+)\)$"
            
            ' // Separate Items by their values into an array.
            vArray = Split(Text, ")")
            
            ReDim Items(0)  ' // Holds the Item strings.
            ReDim Lookup(0) ' // This array holds the index values of the Items array.
            
            ' // Parse the Item strings.
            For n = 0 To UBound(vArray)
                Text = vArray(n) & ")"     ' // Add the closing parenthesis since it was removed by Split.
                If RegExp.Test(Text) Then
                    k = CLng(RegExp.Replace(Text, "$2"))   ' // Convert the value from aa string to a number.
                    Lookup(j) = k               ' // Save Item number in the Lookup array.
                    j = j + 1                   ' // Increment count of valid Items.
                    ReDim Preserve Lookup(j)    ' // Increase the Lookup array by 1 element.
                    If k > UBound(Items) Then
                        ReDim Preserve Items(k) ' // Increase the Items array index to the largest Item number.
                    End If
                    Items(k) = RegExp.Replace(Text, "($2) $1") ' // Save the string as "(value) Item description".
                End If
            Next n
            
            upBound = UBound(Lookup)
            
            ' // Sort the Lookup array in ascending order. Set LoToHi to True for descending sort order.
            Do
                sorted = True
                    For j = 0 To upBound - 1
                        If Not IsEmpty(Lookup(j)) Then
                            If LoToHi Xor Lookup(j) > Lookup(j + 1) Then
                                k = Lookup(j + 1)
                                Lookup(j + 1) = Lookup(j)
                                Lookup(j) = k
                                sorted = False
                            End If
                        End If
                    Next j
                upBound = upBound - 1
            Loop Until sorted Or upBound < 1
                    
            Text = ""
            
            ' // Build the output string.
            For n = 0 To UBound(Lookup)
                k = Lookup(n)
                If Items(k) <> "" Then
                    If Text = "" Then
                        Text = "<ITEMS>" & Items(k)
                    Else
                        Text = Text & "#" & Items(k)
                    End If
                End If
            Next n
            
            If Text <> "" Then Text = Text & "</ITEMS>"
            
            ParseItems = Text
            
    End Function
    Attached Files Attached Files
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  3. #3
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,638
    I'd use:

    Sub M_snb()
      sn = Filter(Split(Replace(Replace(Replace(Replace(Replace(TextBox1.Text, "with ", "~"), "and ", "~"), "comprising ", "~"), ", ", "~"), ")", ")~"), "~"), "(")
       
      With CreateObject("System.Collections.SortedList")
        For Each it In sn
          st = Split(it, "(")
          .Item(Val(st(1))) = "(" & st(1) & Trim(st(0))
        Next
            
        For j = 0 To .Count - 1
          sn(j) = .Item(.getkey(j))
        Next
      End With
        
      MsgBox "<ITEMS>" & Join(sn, "#") & "</ITEMS>"
    End Sub

  4. #4
    Hi Leith, I am lost for words. The help you've provided me here is something beyond imaginable. Thought it was impossible to do but you made it through. Thank you very much.
    Hi snb, as always, you've been such a great help.

  5. #5
    VBAX Expert Leith Ross's Avatar
    Joined
    Oct 2012
    Location
    San Francisco, California
    Posts
    552
    Location
    Hello swaggerbox,

    You're welcome. Glad I could help. If you have any questions about the code, just ask.
    Sincerely,
    Leith Ross

    "1N73LL1G3NC3 15 7H3 4B1L17Y 70 4D4P7 70 CH4NG3 - 573PH3N H4WK1NG"

  6. #6
    Hi Leith, just two questions:

    1) How do I ensure that the first letter of each item is capitalized?
    From the example above:
    (2) sticks#(5) paper#(100) aluminum wire
    should be
    Desired output:
    (2) Sticks#(5) Paper#(100) Aluminum wire

    2) If one item value has letter or only letter, e.g. wire rope (1d) or aluminum wire (d), how to change the code?
    (d) Aluminum wire#(1d) Wire rope#(2) Sticks#(5) Paper

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •