View Full Version : [SOLVED:] Regular Expression Replace
Paul_Hossler
01-27-2012, 07:22 AM
I have a column of suffixs and a column of company names
I'm trying to use a regular expression replace to remove each suffix if it's on the end of each company name (entire column with a really big replace)
Ex. 100 suffixes (CO, COM, INC, INCORP, ...) and 300K lines of company name
"ACME INCORP" becomes "ACME"
Right now I use brute force: loop through each company for each suffix
I'd like to do the entire company column at once
1. I don't think .Replace with wildcards will work
2. I can't figure out a RegEx expression that will
ConOps and as far as I got is in the attachment
Appreciate any ideas, since right now it can take 20-30 minutes to go thru the data and clean it
Paul
Bob Phillips
01-27-2012, 09:46 AM
Paul,
I am assuming the slow loop is the data loop, and that you don't have many items in the replacement list. I would just use replace
Function ReplaceValues()
Dim i As Long
Application.ScreenUpdating = False
With Worksheets("Replacements")
For i = 2 To .Range("A2").End(xlDown).Row
Worksheets("Data").Columns("A").Replace _
What:=.Cells(i, "A").Value2, _
Replacement:=.Cells(i, "B").Value2, _
LookAt:=xlPart, _
MatchCase:=False
Next i
End With
Application.ScreenUpdating = True
End Function
Kenneth Hobs
01-27-2012, 09:56 AM
I would also recommend using manual calculation mode.
e.g.
http://vbaexpress.com/kb/getarticle.php?kb_id=1035
Paul_Hossler
01-27-2012, 02:03 PM
Bob / Ken
1. thanks
2. I believe that the .Replace xlPart would replace more than just the string when it's at the end
So if the suffix column contains
COMPANY
CO
INC.
and the data column contains
"ACME WIDGET COMPANY"
"COMPANY OF PROGRAMMERS, INC."
then
"ACME WIDGET COMPANY" should become "ACME WIDGET"
and "COMPANY OF PROGRAMMERS, INC." should become "COMPANY OF PROGRAMMERS"
Some sort of regular expression was the only thing I could think of
Right now it's a outer loop for each suffix (40-50), and an inner loop on the companies (300k-400k)
Paul
XLD -- as an aside, what's the advantage of .Value2 instead of just .Value ?
Kenneth Hobs
01-27-2012, 02:23 PM
The Replace that I was thinking of Paul was to Split() the string value and check the first and last elements of the array. You could then Join() the array or put non-blank elements into a dictionary object. It should go fairly quickly.
I use Value2 when I know that it is a number or string, and Value for date types. It is a tad faster I would guess because less guessing of type is done.
Paul_Hossler
01-27-2012, 03:08 PM
I was hoping that I could hit the entire 'Company' list with just a single request for each entry in the 'Suffix' list
I know that I'll need the outer 'Suffix' loop, but that's the short one
Paul
Kenneth Hobs
01-27-2012, 05:59 PM
Option Explicit
Option Compare Text
Sub RemoveSuffixes()
Dim suffRange As Range, c As Range, bc As Range, beforeRange As Range
Dim s As String, a() As String
On Error GoTo EndSub
' http://vbaexpress.com/kb/getarticle.php?kb_id=1035
SpeedOn
Set suffRange = Range("A2", Range("A" & Rows.Count).End(xlUp))
Set beforeRange = Range("C2", Range("C" & Rows.Count).End(xlUp))
For Each bc In beforeRange
If IsEmpty(bc) Then GoTo NextBC
a() = Split(bc.Value2, " ")
For Each c In suffRange
If a(UBound(a)) = c.Value2 Then
ReDim Preserve a(0 To (UBound(a) - 1))
bc.Value2 = Join(a(), " ")
GoTo NextBC
End If
Next c
NextBC:
Next bc
EndSub:
SpeedOff
End Sub
shrivallabha
01-27-2012, 09:36 PM
You really have huge data.
Supposing the Original Company names are in Column A and their replacements listed in Column C, how does this formula fare?
In Cell B2:
=SUBSTITUTE(A2," "&LOOKUP(9999,SEARCH($C$2:$C$101,A2,1),$C$2:$C$101),"")
And then copied down.
If it goes faster then it can be implemented through VBA. And it will be non-looping as we can use DataSeries.
Paul_Hossler
01-28-2012, 01:12 PM
Thnks, but the issue unfortunately is not a simple 1 for 1 replacement
ACME COMPANY --> ACME
BILL COMPANY --> BILL
CHARLES COMPANY --> CHARLES
DAN COMPANY --> DAN
DAN CO --> DAN
DAN COMP --> DAN
That would require a 2 column list for all the unique names that requre replacing and their replacement; that would be a maintenance nightmare
I am trying to make it rules based. For performance I'd like to apply the replace operation to an entire column without looping through the company column
So in the above,
(*)(COMPANY) ---> (\1) (and that pretty much is all the RegEx I know)
This would leave
COMPANY OF PEOPLE --> COMPANY OF PEOPLE since COMPANY is not the suffix
Right now the outer loop contains each suffix string (30-40), and the inner loop is for each of the 300k+ lines
So if the Right(Company, Len(suffix)) = suffix, then trim Len(suffix) from company
30-40 suffixes and 300k+ companies is a LOT of looping
Word and regular expression have a 'only if at the end of the word' option, and that's what I think I'm hoping I can do on an entire column
Paul
shrivallabha
01-28-2012, 09:48 PM
Thnks, but the issue unfortunately is not a simple 1 for 1 replacement
ACME COMPANY --> ACME
BILL COMPANY --> BILL
CHARLES COMPANY --> CHARLES
DAN COMPANY --> DAN
DAN CO --> DAN
DAN COMP --> DAN
That would require a 2 column list for all the unique names that requre replacing and their replacement; that would be a maintenance nightmare
......
Paul I think there's a bit of confusion, the formula should work with your suffix list. I'd think there will be few pitfalls but those deviations can be looked after.
I am attaching the concept as I had it in my mind.
Paul_Hossler
01-29-2012, 08:24 AM
I appreciate all the ideas, and while they will work, I'm still trying to improve performance (avoiding premature optimization as I go)
This is what I currently use, with 2 loops
'not using Ken's SpeedOn and SpeedOff yet
Sub WhatItIsNow()
Call WhatItIsNow_1(Worksheets("Sheet1").Range("A2:A8"), Worksheets("Sheet1").Range("C2:C11"))
End Sub
Sub WhatItIsNow_1(rSuffixs As Range, rSource As Range)
Dim aSuffix As Variant, aCompany As Variant
Dim iSuffix As Long, iCompany As Long
aSuffix = rSuffixs.Value
aCompany = rSource.Value
For iSuffix = LBound(aSuffix, 1) To UBound(aSuffix, 1)
aSuffix(iSuffix, 1) = UCase(aSuffix(iSuffix, 1))
For iCompany = LBound(aCompany, 1) To UBound(aCompany, 1)
If UCase(Right(aCompany(iCompany, 1), Len(aSuffix(iSuffix, 1)))) = aSuffix(iSuffix, 1) Then
aCompany(iCompany, 1) = Trim(Left(aCompany(iCompany, 1), Len(aCompany(iCompany, 1)) - Len(aSuffix(iSuffix, 1))))
End If
Next iCompany
Next iSuffix
rSource.Value = aCompany
End Sub
If I could use the built in Replace
Sub UsingBuiltinReplace()
Call UsingBuiltinReplace_1(Worksheets("Sheet1").Range("A2:A8"), Worksheets("Sheet1").Range("C2:C11"))
End Sub
Sub UsingBuiltinReplace_1(rSuffixs As Range, rSource As Range)
Dim rSuffix As Range
For Each rSuffix In rSuffixs.Cells
rSource.Replace What:=rSuffix.Value, Replacement:=vbNullString, _
LookAt:=xlPart, SearchOrder:=xlByRows, _
MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
Next
End Sub
but that does not limit the test to just the end of the string
Actually what I do now is not perfect either, since sometimes a Company value will be trimmed twice
Since the built in Replace allows me to 'hit' an entire column, I was hoping a RegEx would allow that also, thus eliminating the inner loop
Paul
mdmackillop
01-29-2012, 10:43 AM
Add an "xx" to the end of each company and adjust the search accordingly
This takes 18 sec with 470k names
Sub DoReplace()
Dim r As Range
Dim Suffix(), s
Suffix = Array(" coxx", " comxx", " companyxx", " incxx", " inc.xx", " incorpxx", "xx")
Set r = Range(Cells(2, 1), Cells(Rows.Count, 1).End(xlUp))
r.Offset(, 1) = "xx"
r.Offset(, 2).FormulaR1C1 = "=RC1 & RC2"
r.Offset(, 2).Value = r.Offset(, 2).Value
r.Offset(, 1).Clear
For Each s In Suffix
r.Offset(, 2).Replace What:=s, Replacement:="", LookAt:=xlPart
Next
End Sub
Paul_Hossler
01-29-2012, 11:36 AM
Now that is VERY clever :beerchug:
The first thing when I get back to work on Monday -- OK, the second thing after my coffee -- will be to try that out on the real data
Paul
mdmackillop
01-29-2012, 12:20 PM
Glad to help. Just hope it works on the real thing!
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.