PDA

View Full Version : Extract Items from String and then Sort



swaggerbox
11-28-2019, 04:48 AM
I have a string that contains items that I need to extract and sort. Items are those that are preceded by "with", "and", "comprising", and "," followed by their values (enclosed in parenthesis). After items are extracted, I want to re-arrange them from the original format "item description (value)" to the new format "(value) item description". After this, I want to sort the items based on their values, and place all the items inside the <ITEM> tag separated by "#"

For example:
I have a pet project comprising aluminum wire (100), paper (5), and sticks (2).

Desired output:
(2) sticks#(5) paper#(100) aluminum wire

I have attached a representative sample to best illustrate this.

This is a bit complicated so I'm asking anyone for ideas on how best to approach this.

Leith Ross
11-29-2019, 04:21 PM
Hello swaggerbox,

This worked for the example you provided. I cannot guarantee that it will work 100% with your original data. Parsing complex strings is challenging due to the large number of arrangements of syntax and punctuation. Give it try and let me know your results.

The macro below has already been added to the attached workbook. Double Click anywhere in the Text Box and a message box will display the arsed results.

Module1 - Macro Code




Option Explicit


Function ParseItems(ByVal Text As String)


Dim vArray As Variant
Dim Items As Variant
Dim j As Long
Dim k As Long
Dim Lookup As Variant
Dim LoToHi As Boolean
Dim n As Long
Dim RegExp As Object
Dim sorted As Boolean
Dim upBound As Long

Set RegExp = CreateObject("VBScript.RegExp")
RegExp.Global = True
RegExp.IgnoreCase = False
RegExp.Pattern = ".*(?:,|with|and|comprising)(.+)\((\d+)\)$"

' // Separate Items by their values into an array.
vArray = Split(Text, ")")

ReDim Items(0) ' // Holds the Item strings.
ReDim Lookup(0) ' // This array holds the index values of the Items array.

' // Parse the Item strings.
For n = 0 To UBound(vArray)
Text = vArray(n) & ")" ' // Add the closing parenthesis since it was removed by Split.
If RegExp.Test(Text) Then
k = CLng(RegExp.Replace(Text, "$2")) ' // Convert the value from aa string to a number.
Lookup(j) = k ' // Save Item number in the Lookup array.
j = j + 1 ' // Increment count of valid Items.
ReDim Preserve Lookup(j) ' // Increase the Lookup array by 1 element.
If k > UBound(Items) Then
ReDim Preserve Items(k) ' // Increase the Items array index to the largest Item number.
End If
Items(k) = RegExp.Replace(Text, "($2) $1") ' // Save the string as "(value) Item description".
End If
Next n

upBound = UBound(Lookup)

' // Sort the Lookup array in ascending order. Set LoToHi to True for descending sort order.
Do
sorted = True
For j = 0 To upBound - 1
If Not IsEmpty(Lookup(j)) Then
If LoToHi Xor Lookup(j) > Lookup(j + 1) Then
k = Lookup(j + 1)
Lookup(j + 1) = Lookup(j)
Lookup(j) = k
sorted = False
End If
End If
Next j
upBound = upBound - 1
Loop Until sorted Or upBound < 1

Text = ""

' // Build the output string.
For n = 0 To UBound(Lookup)
k = Lookup(n)
If Items(k) <> "" Then
If Text = "" Then
Text = "<ITEMS>" & Items(k)
Else
Text = Text & "#" & Items(k)
End If
End If
Next n

If Text <> "" Then Text = Text & "</ITEMS>"

ParseItems = Text

End Function

snb
11-30-2019, 06:51 AM
I'd use:


Sub M_snb()
sn = Filter(Split(Replace(Replace(Replace(Replace(Replace(TextBox1.Text, "with ", "~"), "and ", "~"), "comprising ", "~"), ", ", "~"), ")", ")~"), "~"), "(")

With CreateObject("System.Collections.SortedList")
For Each it In sn
st = Split(it, "(")
.Item(Val(st(1))) = "(" & st(1) & Trim(st(0))
Next

For j = 0 To .Count - 1
sn(j) = .Item(.getkey(j))
Next
End With

MsgBox "<ITEMS>" & Join(sn, "#") & "</ITEMS>"
End Sub

swaggerbox
12-02-2019, 01:33 AM
Hi Leith, I am lost for words. The help you've provided me here is something beyond imaginable. Thought it was impossible to do but you made it through. Thank you very much.
Hi snb, as always, you've been such a great help.

Leith Ross
12-02-2019, 10:41 AM
Hello swaggerbox,

You're welcome. Glad I could help. If you have any questions about the code, just ask.

swaggerbox
12-04-2019, 05:47 AM
Hi Leith, just two questions:

1) How do I ensure that the first letter of each item is capitalized?
From the example above:
(2) sticks#(5) paper#(100) aluminum wire
should be
Desired output:
(2) Sticks#(5) Paper#(100) Aluminum wire

2) If one item value has letter or only letter, e.g. wire rope (1d) or aluminum wire (d), how to change the code?
(d) Aluminum wire#(1d) Wire rope#(2) Sticks#(5) Paper