PDA

View Full Version : [SOLVED:] basic regex in Word macro



dawnMist
01-07-2015, 12:47 PM
Hello,

I am new to VBA and macros and I have encountered some issues with replacing strings using regular expressions.
Basically, I want to find a "number" and "space"/"no space" and "unit" (litres) and replace the space with non-breaking space (or add it if there is no space).
I need this to ignore the cases where the number is not followed by a single "l" (e.g. 5 long drinks).

I used the recording tool and came up with something like this:


Sub ReplaceLitres()
'
' nbsp between number and unit (l)
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
' looking for a number, zero or one space followed by a single "l" and a dot, comma or end of paragraph.
.Text = "([0-9])([ \?])(l[ \.\,^10^13])"
.Replacement.Text = "\1^s\2\3"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

End Sub


But this code does not seems to work. Is the whole regex incorrect? Can someone please point me in the right direction?

Thank you very much for any answer.

gmaxey
01-07-2015, 02:59 PM
This type of thing doesn't actually use RegEx: Try:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oRng As Word.Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "([0-9]{1,})( )(l[!a-z])"
.Replacement.Text = "\1" & Chr(160) & "\3"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub

dawnMist
01-08-2015, 01:55 AM
Hello Greg,

thank you for your answer, but the macro does not work, the debugger notifies me of an error I cannot identify.
However, you helped me with the construction. I managed to solve the problem like this:


Sub ReplaceLitres()
'
' nbsp between number and unit (l)
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "([0-9]) (l[!a-z])"
.Replacement.Text = "\1^s\2"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "([0-9])(l[!a-z])"
.Replacement.Text = "\1^s\2"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll

End Sub


I know it is not very elegant and subtle solution, but it works. I have problem with grasping the repeating problem. When I try to use one replacing with the following:


...
.Text = "([0-9])([ ]{0,1})(l[!a-z])"
.Replacement.Text = "\1^s\3"
...


the macro does nothing. Why can I use [0-9], [!a-z] but not the range [ ]{0,1} feature?

gmaxey
01-08-2015, 07:29 AM
The macro worked fine here. What line generated the error?

.Text = "([0-9])([ ]{0,1})(l[!a-z])" results in an invalid pattern match. VBA can't find nothing [ ]{0,1} represent nothing

I'm not sure what you are really after, but if it is thins 6 l, 6 l or 6l = 6nbsl, 6nbsl and 6nbsl then try:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oRng As Word.Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "([0-9])([ ]{1,})(l[!a-z])"
.Replacement.Text = "\1^s\3"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = "([0-9])(l[!a-z])"
.Replacement.Text = "\1^s\2"
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub

dawnMist
01-09-2015, 01:27 AM
I cannot run this code. The following error is issued:
Run-time error '5560':
The Find What text contains a pattern match expression which is not valid.
When I fix it using ".MatchWildCards = False", the macro does nothing.

Also, I believe that "([0-9])([ ]{1,})(l[!a-z])" should cover the zero or one spaces, shouldn't it?

gmayor
01-09-2015, 07:10 AM
There is no function that will search for 0 or more instances so [ ]{1,} will only find 1 or more spaces. See http://www.gmayor.com/replace_using_wildcards.htm

gmaxey
01-09-2015, 08:09 AM
That code runs fine here (no errors). The only thing I can think of that might cause your error is a regional setting that requires a different separator in {1,}. Can you try to see if the second part of the procedure run without error?

macropod
01-09-2015, 10:20 PM
Instead of worrying about regionalisation, you could use:
.Text = "([0-9])([ ]@)(l[!a-z])"

dawnMist
01-10-2015, 01:17 PM
I tried it on another PC with another regional setting and it works fine, thank you!
So, there is no way of finding zero or one character into the regular expression? At doesn't do the trick.

I have also encountered an interesting thing. It is not possible to find/replace less/greater than or equal to signs. I found ChrW(8805) and ChrW(8804) for these, but this didn't work either.
Is there any way of looking for these operators?

macropod
01-10-2015, 02:45 PM
You can use the * as a wildcard for 0 or more characters, but not for a specified character. The @ specifies any repeated sequence of 0 or more of the character(s) within the [], which obviously means at least one such character must be found.

gmayor
01-10-2015, 10:58 PM
I have also encountered an interesting thing. It is not possible to find/replace less/greater than or equal to signs. I found ChrW(8805) and ChrW(8804) for these, but this didn't work either.
Is there any way of looking for these operators?The characdters have a special meaning in a wildcard search therefore you must use \< or \> or \= to find the characters themselves in a wildcard search. This is all explained in the linked page I posted earlier - http://www.gmayor.com/replace_using_wildcards.htm

dawnMist
01-11-2015, 02:32 AM
Yes, I read the article, it is a very useful source of information, thank you.
However, I asked for the "less/greater than or equal" and for these symbols I need to use Unicode.


.Text = "(^u8804)([0-9])"
.Replacement.Text = "\1^s\2"
does not work, even with the Microsoft Forms 2.0 enabled.

gmayor
01-11-2015, 03:11 AM
Use
.Text = "(" & ChrW(8804) & ")([0-9])"

dawnMist
01-11-2015, 03:17 AM
Brilliant, thank you so much!

Although your article is great, do you know about any united book/website about using VBA specifically in Word?
I found many books and websites, but they were all regarding using VBA in Excel environment.

gmayor
01-11-2015, 03:36 AM
There are several forums where you can ask about Word VBA issues, including this one.

My own web site (http://www.gmayor.com) has many Word examples, as does that of my friend and fellow contributor Greg Maxey (http://gregmaxey.mvps.org/word_tips.htm). Much of VBA is interchangeable between the products so the Excel sites are often useful. I have never read a VBA book so cannot comment on book recommendations.

If I don't know (or have forgotten) how to do something, then Google is as good as anything at pointing a way forward. Almost invariably the questions you may ask have been asked before.

Occasionally something new comes along, but the vast majority of problems are just a variation on a theme. You just have to interpret them to get the answers you want for your own problems.

dawnMist
01-11-2015, 03:38 AM
Great, thank you all for your help!
I am marking this as solved.

Have a great day!