PDA

View Full Version : Solved: VBA Replace using regular expressions



ianryan
08-24-2012, 04:38 AM
I am parsing an Excel sheet using VBA. The cells values in one of the columns in the sheet may contain 0 or more occurences of the regular expression ^[a-zA-Z]/?s (I understand this to be the regular expression representing a single alphabetic character followed by a single question mark and finally followed by a single lowercase s).

I wish to replace the question mark character in all such cases with a single quote character.

I have little experience with vba and even less with regular expressions. Any help would be appreciated.

Kenneth Hobs
08-24-2012, 06:40 AM
Welcome to the forum!

It is unclear what you mean. Do you mean that the regex string is in say A1 and you want code to modify it? If so:
Range("A1").Value = Replace(Range("A1").value, "?", "'")

ianryan
08-24-2012, 06:59 AM
Thanks for the reply.

No. My Worksheet has 34,000 rows. There is one particular column which contains text values. Within each cell value there may be 0, 1 or more occurences of a ? (question mark character) directly preceded by any alphabetic character and directly followed by an s character e.g. a?s or h?s or g?s etc.

I need to parse each cell value and replace all such question mark characters with a quote character i.e. a?s --> a's or h?s --> h's or g?s --> g's. Unfortunately each cell value may contain ligitimate question marks in which cases the question mark need to be left unchanged.

So, I need a find/replace solution for the column which will replace all occurences of ? (question mark character) with a single quote character but only when the ? (question mark character) is directly preceded by any alphabetic character and directly followed by an s character.

Thanks

Kenneth Hobs
08-24-2012, 08:16 AM
See the kb forum for regexp example code. Using that code or even a shorter version of regex for the usedrange in the column is easy enough.

The article is here: http://vbaexpress.com/kb/getarticle.php?kb_id=68

Did you want to do it once or have it done at each cell change or both?

AirCooledNut
08-24-2012, 12:47 PM
Random post to get me over 5 so I can post links...

The attached document is from this site and I found it helpful.

AirCooledNut
08-24-2012, 12:51 PM
I also use this site http://www.regular-expressions.info/vbscriptexample.html to test my regular expression syntax. The below code is a wrapper module I modified for my purposes, based off of another one I saw.
Option Explicit
'Regular Expression wrappers
'by Toby Erkson
'14Feb2007, 16July2012
'http://www.regular-expressions.info/vbscript.html
'http://www.regular-expressions.info/vbscriptexample.html to test regular expressions on-line
'http://msdn.microsoft.com/library/en-us/script56/html/9f1c25ba-46ce-46af-9f19-ac1d2bcf05d8.asp?frame=true
'http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/711116fb-9c47-47cb-b664-db8141b8cc69.asp
'This Module requires Early Binding (minimum version) = "Microsoft VBScript Regular Expressions 5.5"

Function RegExp_Replace(ReplaceIn, sPattern As String, ReplaceWith As String, Optional IgnoreCase As Boolean = False, _
Optional GlobalMatch As Boolean = False, Optional bMultiLine As Boolean = False)
'This is the Replace method.
'Returns a copy of ReplaceIn with the text of sPattern replaced with ReplaceWith.
'If no match is found, a copy of ReplaceIn is returned unchanged.
Dim RE As RegExp

Set RE = New RegExp
RE.Pattern = sPattern
RE.IgnoreCase = IgnoreCase
RE.Global = GlobalMatch
RE.MultiLine = bMultiLine
RegExp_Replace = RE.Replace(ReplaceIn, ReplaceWith)
End Function

Function RegExp_ShowMatch(FindIn, sPattern As String, Optional IgnoreCase As Boolean = False, _
Optional GlobalMatch As Boolean = False, Optional bMultiLine As Boolean = False) As Variant
'This is the Execute method.
'Returns a Matches collection containing a Match object for each match found in string.
'Execute returns an empty Matches collection if no match is found. The object returned is the found string itself.
Dim i As Long, RE As RegExp, allMatches As MatchCollection, aMatch As Match

Set RE = New RegExp
RE.Pattern = sPattern
RE.IgnoreCase = IgnoreCase
RE.Global = GlobalMatch
RE.MultiLine = bMultiLine
Set allMatches = RE.Execute(FindIn)
If allMatches.Count > 0 Then
ReDim Results(0 To allMatches.Count - 1)
For i = 0 To allMatches.Count - 1
Results(i) = allMatches(i)
Next i
Else 'Nothing found so return nothing
ReDim Results(0)
Results(0) = ""
End If
RegExp_ShowMatch = Results
End Function

Function RegExp_ShowPositionMatch(FindIn, sPattern As String, Optional IgnoreCase As Boolean = False, _
Optional GlobalMatch As Boolean = False, Optional bMultiLine As Boolean = False) As Variant
'This is the Execute method.
'Returns a Matches collection for each match found in string.
'Execute returns -1 in Results(0,0) if no match is found.
Dim i As Long, RE As RegExp, allMatches As MatchCollection, aMatch As Match

Set RE = New RegExp
RE.Pattern = sPattern
RE.IgnoreCase = IgnoreCase
RE.Global = GlobalMatch
RE.MultiLine = bMultiLine
Set allMatches = RE.Execute(FindIn)
If allMatches.Count > 0 Then
ReDim Results(0 To allMatches.Count - 1, 1)
For i = 0 To allMatches.Count - 1
Results(i, 0) = allMatches.Item(i).FirstIndex 'The position of the match in the searched string
Results(i, 1) = allMatches.Item(i).Value 'The text found in the search
Next i
Else 'Nothing found
ReDim Results(0, 0)
Results(0, 0) = -1
End If
RegExp_ShowPositionMatch = Results
End Function

Function RegExp_Exists(FindIn, sPattern As String, Optional IgnoreCase As Boolean = False, _
Optional GlobalMatch As Boolean = False, Optional bMultiLine As Boolean = False) As Boolean
'This is the Test method.
'Returns True if a pattern match is found; False if no match is found.
Dim i As Long, RE As RegExp, allMatches As MatchCollection, aMatch As Match

Set RE = New RegExp
RE.Pattern = sPattern
RE.IgnoreCase = IgnoreCase
RE.Global = GlobalMatch
RE.MultiLine = bMultiLine
RegExp_Exists = RE.test(FindIn)
End Function This above code contains the functions to use the regular expressions. Note that you will also need to have a reference to "Microsoft VBScript Regular Expressions 5.5". In the VB Editor select Tools >> References... and check it from the list of available references (then click OK).

Teeroy
08-26-2012, 11:10 PM
Hi ianryan,

If compatability is an issue you can do without the references (i.e. late bind) by using the following:

Dim RE

Set RE = CreateObject("vbscript.regexp")


Toby's site suggestion for testing REGEXs is a good one and can reduce unexpected results. The same site (http://www.regular-expressions.info/reference.html) has a useful reference for pattern building.

I may be over-cautious but I ALWAYS test REGEXs on copies of data before setting it loose.

snb
08-27-2012, 03:15 AM
if the column doesn't contain any question mark followed by an s that shouldn't be replaced you could try:


sub snb()
columns(1).replace "~?s","'s"
end sub

GTO
08-27-2012, 04:20 AM
... Worksheet has 34,000 rows. There is one particular column which contains text values. Within each cell value there may be 0, 1 or more occurences of a ? (question mark character) directly preceded by any alphabetic character and directly followed by an s character e.g. a?s or h?s or g?s etc...

Hi there,

Afraid I am a babe in the woods still, when dealing with RegExp's, but here is a tiny shot.

In a Standard Module:
Option Explicit

Sub example()
' Late-Bound | Early-Bound
Dim REX As Object ' RegExp
Dim oMatchCollection As Object ' MatchCollection
Dim oMatch As Object ' Match
Dim aryVals As Variant
Dim n As Long

Set REX = CreateObject("VBScript.RegExp")
With REX
.Global = True
.IgnoreCase = False
.Pattern = "[A-Za-z]\?s"
End With

With Sheet1.Range(Sheet1.Cells(1), Sheet1.Cells(Sheet1.Rows.Count, 1).End(xlUp))
aryVals = .Value
For n = LBound(aryVals) To UBound(aryVals, 1)
If REX.Test(aryVals(n, 1)) Then
Set oMatchCollection = REX.Execute(aryVals(n, 1))
For Each oMatch In oMatchCollection
aryVals(n, 1) = Left(aryVals(n, 1), oMatch.FirstIndex + 1) _
& "'" _
& Mid(aryVals(n, 1), oMatch.FirstIndex + 3)
Next
End If
Next
.Value = aryVals
End With
End Sub

Please note the pattern, as ^ and $ mark the beginning and end of a string. This would mean that your pattern would only find a match if the cell contained one letter, followed by a question mark, followed by an 's'. Also, you'll see that the backslash is the character you want to indicate a literal.

If you do end up looping through 34k+ cells, I would suggest flopping the values into an array, as shown. It'll be bunches faster:cloud9:

Mark

ianryan
08-27-2012, 04:44 AM
Thanks for all responses. All very helpful.

What is wrong with the following code:-


Function ReplaceQuestionMarks(L_Source As String) As String

Dim L_Regex As Object
Dim L_Input As String
Dim L_Output As String
Dim L_Replacement As String

Set L_Regex = CreateObject("vbscript.regexp")
With L_Regex
.MultiLine = False
.Global = False
.IgnoreCase = True
.pattern = "[a-zA-Z]\?[s]"
End With
L_Input = L_Source
L_Replacement = "$1'$2"
L_Output = L_Regex.Replace(L_Source, L_Replacement)
ReplaceQuestionMarks = L_Output

End Function


I throught that in the L_Replacement string "$1" should represent the 1st regex group from the input string and "$2" should represent the 2nd regex group from the input string. However, when I run this code instead of for example "t?s" being replaced with "t's" it is actually replaced with the literal string "$1'$2".

There are a number of other regular expression replacement scenario I need to handle so if I can get this first basic scenario working I should be good to build on this to deal with the other remaining scenarios.

Thanks

snb
08-27-2012, 05:22 AM
Sub example()
With CreateObject("VBScript.RegExp")
.Global = True
.IgnoreCase = True
.Pattern = "[a-z]\?s"

For Each cl In Columns(1).SpecialCells(2)
For Each m In .Execute(cl)
Columns(1).Replace m, Replace(m, "?", "'")
Next
Next
End With
End Sub

Teeroy
08-27-2012, 05:24 AM
L_Replacement is a string, not a pattern. You have to build the string that you want to put in place of the match from the match as GTO has done inside the IF block (I'd have done it slightly differently with a replace function but that's just different thought processes, the result's the same).

As an aside L_Input is superfluous and you could do without the intermediate step of L_Output=...

GTO
08-27-2012, 06:21 AM
...representing a single alphabetic character followed by a single question mark and finally followed by a single lowercase s)...

If you want to check against a trailing lowercase 's', then .IgnoreCase should be False I believe. The '1$2$' would not have an apostrophe in the expression, but more importantly, this refers to saved subexpressions, which you have none. I am afraid these still confuse me, so not much help. It would be neat if we could replace just the second subexpression of each match on something like .Pattern = "([A-Za-z])(\?)(s)" ...speaking of which...


...(I'd have done it slightly differently with a replace function but that's just different thought processes, the result's the same)...

Thank you and just if you have a moment, could you show that?

Here is what I tried for input/output, as I understand the OP's dilemma.
I have d?s. Paul has x?s as well as P?s.
You have b?s. Do you have d?s or d?s as well?
Tom doesn 't have any d's, but he has F?S.
This doesn 't match

I have d's. Paul has x's as well as P's.
You have b's. Do you have d's or d's as well?
Tom doesn 't have any d's, but he has F?S.
This doesn 't match

Thank you so much,

Mark

PS - Afraid it's wayyy past time to be in bed, but will check later.

Kenneth Hobs
08-27-2012, 09:38 AM
Similar to snb's code but with Option Explicit, early binding, and my speed routines.

Option Explicit
'http://vbaexpress.com/kb/getarticle.php?kb_id=1035
Public glb_origCalculationMode As Integer

' Tools, References..., MicroSoft VBScript Regular Expresssions 5.5
Sub example_ken()
Dim reg As RegExp, cl As Range, m As Variant

On Error GoTo EndNow
SpeedOn

Set reg = New RegExp
With reg
.Global = True
.IgnoreCase = True
.Pattern = "[a-z]\?s"

For Each cl In Columns(1).SpecialCells(2)
For Each m In .Execute(cl.Value2)
cl.Replace m, Replace(m, "?", "'")
Next m
Next cl
End With

EndNow:
SpeedOff
End Sub

Sub SpeedOn(Optional StatusBarMsg As String = "Running macro...")
glb_origCalculationMode = Application.Calculation
With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False
.EnableEvents = False
.DisplayAlerts = False
.Cursor = xlWait
.StatusBar = StatusBarMsg
.EnableCancelKey = xlErrorHandler
End With
End Sub

Sub SpeedOff()
With Application
.Calculation = glb_origCalculationMode
.ScreenUpdating = True
.EnableEvents = True
.DisplayAlerts = True
.CalculateBeforeSave = True
.Cursor = xlDefault
.StatusBar = False
.EnableCancelKey = xlInterrupt
End With
End Sub

snb
08-27-2012, 12:16 PM
Wouldn't this be faster ?


sub snb()
for j=1 to 26
with columns(1)
.replace chr(64 +j) & "~?s", "chr(64+j) & "'s"
.replace chr(96 +j) & "~?s", "chr(96+j) & "'s"
end with
next
end sub

PS. the VBA tags destroy the code.

Kenneth Hobs
08-27-2012, 05:14 PM
I am not sure why you had problems pasting code between vba code tags. They might paste as one line if you copy vba code from the forum and then paste between tags. When I do that, I copy and paste to the workbook and then cut and paste that between my vba code tags.

After I removed your extra quote, I ran these speed tests on 34,000 cells.

Averages:
Your last routine vs my last routine less speedup in seconds.
6.3 vs. 6.5

Your last routine vs my last routine in seconds with speedup routine in both.
5.7 vs. 5.4

'Option Explicit
'Oorang, http://www.vbaexpress.com/forum/showthread.php?t=30477
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
' http://www.vbaexpress.com/forum/showthread.php?t=43429

Sub SetData()
Range("A1:A34000").Value2 = "You have b?s. Do you have d?s or d?s as well?"
End Sub

Sub snb()
Dim lStart As Long
Dim lEnd As Long
lStart = GetTickCount

On Error GoTo EndNow
SpeedOn

For j = 1 To 26
With Columns(1)
.Replace Chr(64 + j) & "~?s", Chr(64 + j) & " 's"
.Replace Chr(96 + j) & "~?s", Chr(96 + j) & " 's"
End With
Next

lEnd = GetTickCount
MsgBox Format$((lEnd - lStart) / 1000&, "0.000 ""Seconds"""), vbInformation, "Time Elapsed"

EndNow:
SpeedOff
End Sub

snb
08-28-2012, 01:57 AM
@KH

My "'s" was misinterpreted by the VBA code tags as a comment in a VBA line.
That's what inserted the extra quote.

Thank you for testing.

Teeroy
08-28-2012, 04:46 AM
Hi Mark (@GTO) see @KH's example for using replace. He and @SNB used the replace on the match object in the way I was thinking, but more elegantly. :bow:

GTO
08-28-2012, 05:50 PM
Hi Mark (@GTO) see @KH's example for using replace. He and @SNB used the replace on the match object in the way I was thinking, but more elegantly. :bow:

ACK! Okay, a brain-fade moment for me

Thank you Teeroy. I am afraid I thought you meant RegExp's .Replace...

Thanks for responding,

Mark

Teeroy
08-28-2012, 05:59 PM
LOL. Happens to the best of us.

Troy

AirCooledNut
08-29-2012, 07:37 AM
With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False
.EnableEvents = False
.DisplayAlerts = False
.Cursor = xlWait
.StatusBar = StatusBarMsg
.EnableCancelKey = xlErrorHandler
End With How does .Cursor = xlWait help speed up the code? I'm familiar with the rest. :think:
BTW, I didn't know there was a .Cursor property...neat! :thumb

Kenneth Hobs
08-29-2012, 08:27 AM
The cursor is just a show thing. I like people to know that something is running. You will notice that I put a message on the status bar as well.

GTO
08-30-2012, 05:48 AM
....I ran these speed tests on 34,000 cells.

Averages:
Your last routine vs my last routine less speedup in seconds.
6.3 vs. 6.5

Your last routine vs my last routine in seconds with speedup routine in both.
5.7 vs. 5.4...

@Kenneth Hobs:

Howdy:hi:

I am afraid that curiosity has gotten the best of me. Did you include my suggestion as well? I ask, as when I do, I get results of about 3.5 seconds for your and snb's and .734 seconds for mine (all at work on a nice PC; and admittedly by memory).

At home, 427 seconds (snb), 935 seconds (yours), and 37 seconds (mine) on my "Is this thing really plugged in?" laptop (circa Office 2000). I tested snb's twice (because of the great difference) just to make sure I had run SetData() - but it staill ran at 458 seconds.

(And yes, I was soooo regretting not having shortened the test data)

Hope all is blessed with you and yours,

Mark

Paul_Hossler
08-30-2012, 06:08 AM
Been getting a lot of good ideas from this thread.

I assume that there's no way to use RegEx without the 'one cell at a time' VBA looping?

range.replace will do the whole range at once (or at least without VBA looping) using built in Excel code. I'd think that using the inherent .Replace would be faster that looping, so if RegEx worked the same ....?

I'd guess that if it were possible, it would have been mentioned :content:



Sub AllAtOnce()
Call ActiveSheet.Cells(1, 1).CurrentRegion.Replace("AAA", "zzz", xlPart)
End Sub



Paul

Kenneth Hobs
08-30-2012, 06:30 AM
I am not surprised Mark. Of course one does need to reset the strings before testing each routine as you said with SetData(). Once the routine runs, a 2nd run is very fast because nothing is changed.

For my work computer using my speed routine, snb/mine/yours: 3.6/3.8/0.6. Yours ran the same with my speed routines and without. That is because you are only writing once.

Using Paul's method, it was 0.7 without speedup and 0.6 with speedup.
Sub AllAtOnce()
Dim lStart As Long
Dim lEnd As Long

lStart = GetTickCount
On Error GoTo EndNow
SpeedOn

ActiveSheet.Cells(1, 1).CurrentRegion.Replace "?s", "'s", xlPart

lEnd = GetTickCount
MsgBox Format$((lEnd - lStart) / 1000&, "0.000 ""Seconds"""), vbInformation, "Time Elapsed"

EndNow:
SpeedOff
End Sub

Kenneth Hobs
08-30-2012, 11:47 AM
A tilde is needed for Paul's method. It was 0.6 seconds with and without speedup but 0.047 faster with speedup.

Private Declare Function GetTickCount Lib "kernel32.dll" () As Long

Sub AllAtOnce()
Dim lStart As Long
Dim lEnd As Long

lStart = GetTickCount
On Error GoTo EndNow
SpeedOn

ActiveSheet.Cells(1, 1).CurrentRegion.Replace "~?s", "'s", xlPart

lEnd = GetTickCount
MsgBox Format$((lEnd - lStart) / 1000&, "0.000 ""Seconds"""), vbInformation, "Time Elapsed"

EndNow:
SpeedOff
End Sub