PDA

View Full Version : [SOLVED] Looking to code a function to efficiently remove multiple commas in a string



rlv
04-12-2017, 08:54 AM
I've been experimenting with different ways to efficiently remove multiple commas from a text string. So far the best performers are these three:


'Use VBA replace function, looping until all multiple commas are replaced
Function CollapseCommas2(ByVal AnyString As String) As String
Dim DDelim As String
Dim S1 As String
Dim Delimiter As String


Delimiter = "," 'in this case, commas
DDelim = Delimiter & Delimiter 'double delimiters
S1 = AnyString
If InStr(AnyString, DDelim) > 0 Then
Do
S1 = VBA.Replace(S1, DDelim, Delimiter, 1, -1)
Loop Until InStr(S1, DDelim) = 0
End If
CollapseCommas2 = S1
End Function





'Use excel Application.WorksheetFunction.Substitute, looping until all multiple commas are replaced
Function CollapseCommas2b(ByVal AnyString As String) As String
Dim DDelim As String
Dim S1 As String
Dim Delimiter As String


Delimiter = "," 'in this case, commas
DDelim = Delimiter & Delimiter 'double delimiters
S1 = AnyString
If InStr(AnyString, DDelim) > 0 Then
Do
S1 = Application.WorksheetFunction.Substitute(S1, DDelim, Delimiter)
Loop Until InStr(S1, DDelim) = 0
End If
CollapseCommas2b = S1
End Function




'Slight variation on Fn3, using Application.WorksheetFunction.Trim instead of Application.Trim
Function CollapseCommas3a(ByVal AnyString As String) As String
Dim DDelim As String
Dim S1 As String
Dim Delimiter As String


Delimiter = "," 'in this case, commas
DDelim = Delimiter & Delimiter 'double delimiters


S1 = AnyString
S1 = VBA.Replace(S1, " ", Chr(248), 1, -1) 'replace spaces with obscure character unlikely to be found in the string
S1 = VBA.Replace(S1, Delimiter, " ", 1, -1) 'replace commas with spaces
S1 = Application.WorksheetFunction.Trim(S1) 'this is fast and unlike vba.Trim, collapses internal spaces
S1 = VBA.Replace(S1, " ", Delimiter, 1, -1) 'restore the commas
S1 = VBA.Replace(S1, Chr(248), " ", 1, -1) 'restore the spaces


CollapseCommas3a = S1
End Function


Is there another approach I'm overlooking? Other experiments attached.


Thanks.

mike7952
04-12-2017, 11:13 AM
This will work


Sub abc()
Const cDelimiter As String = ","
Const cReplaceDelimiter As String = " "
Dim cell As Range

For Each cell In Range("d2", Cells(Rows.Count, "d").End(xlUp))
Cells(cell.Row, 2).Value = Replace(WorksheetFunction.Trim(Replace(cell.Value, cDelimiter, " ")), cReplaceDelimiter, cDelimiter)
Next
End Sub

rlv
04-12-2017, 12:52 PM
Hi Mike. Thanks for taking a crack at it. I think my sample spreadsheet may have mislead you into thinking I wanted to loop through cells D3 to D8. Really, I was just using those cells to store different trial strings to paste into D2. Sorry for the confusion.


Your sub takes a similar approach to the Function CollapseCommas3a I posted above, though it did eat the existing space in "thank you". When I extended it extended to preserve any existing spaces, it worked fine.



Sub def()
Const cDelimiter As String = ","
Const cReplaceDelimiter As String = " "
Dim cell As Range


For Each cell In Range("d2", Cells(Rows.count, "d").End(xlUp))
Cells(cell.Row, 2).Value = Replace(Replace(Application.WorksheetFunction.Trim(Replace(Replace(cell.Val ue, cReplaceDelimiter, Chr(248)), cDelimiter, cReplaceDelimiter)), cReplaceDelimiter, cDelimiter), Chr(248), " ") 'restore the spaces
Next
End Sub

p45cal
04-12-2017, 01:10 PM
try:
Function RegexpReplace1(ByVal parmString As String) As String
'================================================
'Use local static regexp object to remove duplicate spaces
'================================================
Static oRE As Object
If oRE Is Nothing Then
Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "(,)\1+"
End If
RegexpReplace1 = oRE.Replace(parmString, ",")
End Function
stolen and adapted from https://www.experts-exchange.com/articles/17559/Efficient-String-Clean-up-Removing-Internal-Duplicate-Spaces.html
Here, it seems to be about 4x or 5x faster than the fastest so far. Most of the time is saved by making oRE a static variable (it's 8 times slower than the fastest if the object has to be created at every run of the function), but it remains a regexp object until the project is reset for some reason, so using it as a worksheet function in multiple cells should still be quite fast.

Early binding seems to speed it up a bit more, about 10% faster.

rlv
04-12-2017, 02:33 PM
Wow that's a really speedy function. Thanks. The more repeating commas in the string, the greater the speed difference. The link is just as interesting. I guess I need to pay more attention to the regexp object. You know, I was doing some experimentation with the regexp object a couple of weeks ago as an alternate to instr for finding out if a substring existed in a string and it was really s l o w - but of course I was not declaring the object as static as you have done here. Live and learn. Thanks again.

mike7952
04-12-2017, 07:18 PM
Wow that's a really speedy function. Thanks. The more repeating commas in the string, the greater the speed difference. The link is just as interesting. I guess I need to pay more attention to the regexp object. You know, I was doing some experimentation with the regexp object a couple of weeks ago as an alternate to instr for finding out if a substring existed in a string and it was really s l o w - but of course I was not declaring the object as static as you have done here. Live and learn. Thanks again.


So what is wrong with the solution. I changed the code to only to look at D2. What an I missing?


Sub abc()
Const cDelimiter As String = ","
Const cReplaceDelimiter As String = " "
Dim cell As Range

For Each cell In Range("d2", Cells(Rows.Count, "d").End(xlUp))
Cells(cell.Row, 3).Value = Replace(WorksheetFunction.Trim(Replace(cell.Value, cDelimiter, " ")), cReplaceDelimiter, cDelimiter)
Next
End Sub

rlv
04-12-2017, 09:27 PM
So what is wrong with the solution.
Nothing at all (well, except perhaps for overlooking the possibility that the input comma delimited string could have existing spaces in it which need to be preserved.). In my original post in this thread, I posted 3 functions I've tried using different approaches. The approach you took in sub abc to removing multiple commas

1. Replace "," with " "
2. Use WorksheetFunction.Trim to collapse multiple spaces
3. Replace " ", with ","

is essentially identical to my 3rd posted function, and if I repackage abc as a function


Function abc(AnyString As String) As String
Const cDelimiter As String = ","
Const cReplaceDelimiter As String = " "
'Dim cell As Range

'For Each cell In Range("d2", Cells(Rows.count, "d").End(xlUp))
abc = Replace(WorksheetFunction.Trim(Replace(AnyString, cDelimiter, " ")), cReplaceDelimiter, cDelimiter)
'Next
End Function


and plug it in into my test sub, the execution speed of function abc is on par with that one. No faster and no slower.

mike7952
04-13-2017, 07:32 AM
So do you have a working solution? If not I don't know what I'm missing as per your request.

rlv
04-13-2017, 09:38 PM
Yes I do have a solution. The function that p45cal provided is 2x to 4x faster than any of my functions.

snb
04-14-2017, 01:34 AM
Why don't you


Sub M_snb()
For j = 20 To 1 Step -1
Sheets(1).Cells.Replace String(j, ","), ","
Next
End Sub

rlv
04-14-2017, 05:47 AM
Seems like the RegexpReplace function still has the performance edge. I'm seeing a ~3x speed advantage (YMMV).


M_snb Elapsed: 1.3789
RegexpReplace1 Elapsed: 0.5117



Sub M_snb_SpeedTest()
Dim R As Range, T1 As Double
Dim J As Long, RS As String

Sheets(1).Range("d2").Copy
Set R = Sheets(2).Range("A1").Resize(10000, 1)
R.PasteSpecial (xlPasteValues)

T1 = Timer
For J = 20 To 1 Step -1
Sheets(2).Cells.Replace String(J, ","), ","
Next

RS = "M_snb Elapsed: " & VBA.Format(Timer - T1, "0.0000")
Debug.Print vbCr & RS

Sheets(1).Range("d2").Copy
R.PasteSpecial (xlPasteValues)

T1 = Timer
For Each R In Sheets(2).UsedRange
R.Value = RegexpReplace1(R.Value)
Next R
RS = "RegexpReplace1 Elapsed: " & VBA.Format(Timer - T1, "0.0000")
Debug.Print RS
End Sub

snb
04-14-2017, 05:55 AM
.8 seconds ? I woudn't bother.

rlv
04-14-2017, 06:36 AM
.8 seconds ? I woudn't bother.
I agree it's a judgment call. For me personally, using a sheet-level global replace has some downsides. That's not to say if it offered a huge speed advantage, I would not consider it. But to use that approach means all the strings need to be loaded into a worksheet somewhere, and that's not always feasible. It's relatively rare that I want to globally remove every double delimiter everywhere, and usually removing duplicates is a prelude to doing something else to the string - both of which tend to reduce the value of a global sheet-level replacement over a string-level replacement for me, since I'll have to loop through the cells at some point anyway. Also, a UDF is more flexible and more portable. I mainly program in the excel flavor of vba, but not exclusively, and having something that will transfer to a word or access VBA environment has value for me.

Btw, I've been enjoying your 'VBA for smarties' web page. It's a great resource and I've found it hugely useful. Thank you for putting it together.

snb
04-14-2017, 06:45 AM
another method


sub M_snb()
c00=cells(2,4)
For j = 20 To 1 Step -1
c00= Replace(c00, String(j, ","), ",")
Next
cells(2,4)=c00
End Sub