PDA

View Full Version : Ensure that only unique values are extracted



swaggerbox
02-24-2020, 05:42 AM
I need help in changing the code below to ensure that only unique values are extracted in the results.

This is a request for the modification of the code by Leith Ross:
http://www.vbaexpress.com/forum/showthread.php?66330-Extract-Items-from-String-and-then-Sort

Here's the code:



Function ParseItems(ByVal Text As String)


Dim vArray As Variant
Dim Items As Variant
Dim j As Long
Dim k As Long
Dim Lookup As Variant
Dim LoToHi As Boolean
Dim n As Long
Dim RegExp As Object
Dim sorted As Boolean
Dim upBound As Long

Set RegExp = CreateObject("VBScript.RegExp")
RegExp.Global = True
RegExp.IgnoreCase = False
RegExp.Pattern = ".*(?:,|with|and|the|comprising)(.+)\((\d+)\)$"

' // Separate Items by their values into an array.
vArray = Split(Text, ")")

ReDim Items(0) ' // Holds the Item strings.
ReDim Lookup(0) ' // This array holds the index values of the Items array.

' // Parse the Item strings.
For n = 0 To UBound(vArray)
Text = vArray(n) & ")" ' // Add the closing parenthesis since it was removed by Split.
If RegExp.Test(Text) Then
k = CLng(RegExp.Replace(Text, "$2")) ' // Convert the value from aa string to a number.
Lookup(j) = k ' // Save Item number in the Lookup array.
j = j + 1 ' // Increment count of valid Items.
ReDim Preserve Lookup(j) ' // Increase the Lookup array by 1 element.
If k > UBound(Items) Then
ReDim Preserve Items(k) ' // Increase the Items array index to the largest Item number.
End If
Items(k) = RegExp.Replace(Text, "($2) $1") ' // Save the string as "(value) Item description".
End If
Next n

upBound = UBound(Lookup)

' // Sort the Lookup array in ascending order. Set LoToHi to True for descending sort order.
Do
sorted = True
For j = 0 To upBound - 1
If Not IsEmpty(Lookup(j)) Then
If LoToHi Xor Lookup(j) > Lookup(j + 1) Then
k = Lookup(j + 1)
Lookup(j + 1) = Lookup(j)
Lookup(j) = k
sorted = False
End If
End If
Next j
upBound = upBound - 1
Loop Until sorted Or upBound < 1

Text = ""

' // Build the output string.
For n = 0 To UBound(Lookup)
k = Lookup(n)
If Items(k) <> "" Then
If Text = "" Then
Text = "<ITEMS>" & Items(k)
Else
Text = Text & "#" & Items(k)
End If
End If
Next n

If Text <> "" Then Text = Text & "</ITEMS>"

ParseItems = Text

End Function



As you can see the macro outputs the following:

<ITEMS>(1) the kettle body #(1) the kettle body #(1) the kettle body #(1) the kettle body #(1) the kettle body #(2) a kettle cover #(2) a kettle cover #(3) a switch #(4) le #(4) le #(4) le #(4) le #(5) a microprocessor #(5) a microprocessor #(6) a wireless emitting module #(6) a wireless emitting module #(7) plug #(7) plug #(8) a separation of the base #(8) a separation of the base #(9) a working indication lamp #(10) a temperature sensing device #(10) a temperature sensing device #(10) a temperature sensing device #(11) a temperature difference generating module #(11) a temperature difference generating module #(11) a temperature difference generating module </ITEMS>

#1 There are multiple entries that are duplicates (1,2,4,5,6,7,8,10,11), how to limit to unique entries only
#2 Entry number 4 is truncated (don't know why)
#3 There should be no space before the SHARP (#) sign
#4 Remove the "the" and the "a" before the value, e.g. "kettle body" instead of "the kettle body"; "kettle cover" instead of "a kettle cover"

Could anyone help me with this?

Leith Ross
02-24-2020, 09:29 AM
Hello swaggerbox,

I'll look this over today. Thanks for posting the workbook.

swaggerbox
02-24-2020, 07:38 PM
looking forward to it Leith.