PDA

View Full Version : [SOLVED:] RegExp not accounting for Content Control tags



kazi
06-04-2018, 08:56 AM
Hello, I need to solve a problem (search and select text range in VBA macro) with some constraints.
1. I am using regex because of some requirements which could not be handled by word Find.
2. The search range may contain Content control Tags.

The problem is that regex match does not account for Content Control tags characters, however when I use range object to select range based on index of the regex match, the selection is off by number of content control tags * 2. Each cc tag has 2 characters, start and end.

If the search range has not Content Controls, the code works fine, if I put one or more content controls into the search range the match range selection is off. I am working on code trying to move match range based on CC counts, however the challenge is that real examples could contain nested content controls and search word could appear within a content control.

Is there a way for regex to account for content controls or for range select to skip content control tag when selecting range?


here is a test code:
Sub regexHelp()

Dim RegEx As New RegExp
Dim Matches, match As match
Dim searchRange, matchRange As Word.range

'search example: hello world!
RegEx.pattern = "world"
RegEx.IgnoreCase = False
RegEx.Global = False

Set searchRange = selection.range
Set Matches = RegEx.Execute(searchRange)
Set match = Matches.Item(0)
Debug.Print "search range: ", searchRange.start, searchRange.End


Debug.Print " match range: ", match.value, match.FirstIndex, match.Length
Set matchRange = ActiveDocument.range(searchRange.start + match.FirstIndex, searchRange.start + match.FirstIndex + match.Length)
matchRange.Select

End Sub


thank you

macropod
06-04-2018, 02:21 PM
So why are you using RegEx instead of Word's own wildcards? The fundamental problem you're having is that RegEx treats everything as a string, which is not how your Word document is structured.

kazi
06-05-2018, 10:53 AM
I am using regex because of some requirements which could not be handled by word Find, thi includes wild cards.

Paul_Hossler
06-05-2018, 11:28 AM
I am using regex because of some requirements which could not be handled by word Find, thi includes wild cards.

Word's wildcard option in F&R is pretty powerful

What is a specific example that it couldn't do?

kazi
06-05-2018, 01:38 PM
I use word 2010.
Some of the examples include Finding sentences that start with Numbers, or start with patterns like 'C ME 1.7.1' , 'C COL 1.3.1' where digits can have different values. The find with wild cards is limited, it's mentioned in the Microsoft documentation

macropod
06-05-2018, 05:51 PM
Both the examples you gave are easily handled via a wildcard Find, where:
Find = C [A-Z]{2,3} [0-9].[0-9].[0-9]
Methinks you're assuming wildcards in Word are more limited than they really are. And, unlike RegEx, Word's Find allows you to specify things like font attributes (e.g. bold, or a Style Name), etc. Moreover, when a match is made via Find in Word, the current range automatically moves to the found range, making it easy to work with what has been found.

Paul_Hossler
06-05-2018, 06:12 PM
You can use Word's wildcard F&R to find paragraphs that start with that pattern

The [a-z] and {m,n} notation is basically the came as RE's, ^13 is the paragraph, but Word has some other specials that work with it's content




Option Explicit

Sub Macro1()
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
With Selection.Find
.Text = "[A-Z]{1,} [A-Z]{2,} [0-9]{1,}.[0-9]{1,}.[0-9]{1,}*^13"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchWildcards = True
End With
End Sub

kazi
06-07-2018, 07:51 AM
Thanks guys. I was able to solve current problem with Word Find. Still it would be interesting to see if regex can be used with word smart tags.
Thanks for you help, it definitely got me to the solution