PDA

View Full Version : [SOLVED:] Use Regex in VBA to replace text between tags in PowerPoint (multiple occurences)



majom
10-29-2019, 05:14 PM
Hi,

I am struggling with the following problem:

My goal is to replace some text in the slide notes of a PowerPoint file using VBA. If the text in the slide notes looks like this (single occurence of the tag to be deleted) my code below works. However, for multiple occurences of the tags it does not work correctly.

The input in this case looks like this:


This is a first sentence.
<code1>This second sentence needs to be deleted.</code1>
Here is a third sentence. This one should be kept.
<code1>This fourth sentence needs to be deleted as well.</code1>


And the wrong output like this:


This is a first sentence.


Actually, I want to have this:


This is a first sentence.
Here is a third sentence. 'This one should be kept.


I would appreciate any advice on how to change the code below.


Sub sync_text()
input_text = ActivePresentation.Slides(1).NotesPage.Shapes(2).TextFrame.TextRange.Text
' Delete code1
Set regX_delete = CreateObject("vbscript.regexp")
With regX_delete
.Global = True
.Pattern = "<code1>(.+)</code1>"
End With
output_text = regX_delete.Replace(input_text, " ")
ActivePresentation.Slides(1).NotesPage.Shapes(2).TextFrame.TextRange.Text = output_text
End Sub


Thanks for your help!

John Wilson
10-30-2019, 03:09 AM
A: Always declare variables!
B. You have st the pattern to be "GREEDY" - this means it will look for the first <code1> and keep looking till it gets to the last </code1> and delete everything in between. Use a ? in the pattern to set to NON GREEDY


Sub sync_text()
'Declare variables
Dim input_text As String
Dim output_text As String
Dim regX_delete As Object
input_text = ActivePresentation.Slides(1).NotesPage.Shapes(2).TextFrame.TextRange.Text

' Delete code1
Set regX_delete = CreateObject("vbscript.regexp")
With regX_delete
.Global = True
.Pattern = "<code1>(.+?)</code1>"
End With

output_text = regX_delete.Replace(input_text, " ")

ActivePresentation.Slides(1).NotesPage.Shapes(2).TextFrame.TextRange.Text = output_text

End Sub

John Wilson
10-30-2019, 10:06 AM
Note this is not the most efficient way to search but it is probably fast enough and easier to understand.


.Global = True
.Pattern = "<code1>[^<]*</code1>"

would run faster but unless you have a huge file you will not notice. It searches for anything between the tags EXCEPT '<'(the start of the next tag)

majom
11-01-2019, 06:57 AM
Thanks for your help (and also the advice on delcaring all variables)!

Works great.

majom
11-07-2019, 07:16 AM
Thanks again, the code works great on Windows.

However, I just learned that on Mac the code won't run. The following line throws an error:

regX_delete = CreateObject("vbscript.regexp"

Is there any way to do implement this "search & replace" procedure such that it works for both Windows & Mac?

John Wilson
11-07-2019, 02:47 PM
Not as far as I know. I don't think you can run a regX object on a Mac but then I'm not a Mac person!

majom
11-07-2019, 04:39 PM
That's unfortunate, but after googling for a while I was kind of expecting this. Would it theoretically be possible to "translate" this regex to a bit more complex code chunk that uses the find() method. That means, first identifying how many <code1> tags exist. Next, looping over this number deleting everything from the first character of the first appearance of <code1> to the last character of the first appearance of </code1>, and so on. My assumption is that such a code would run on a Mac.

Does this make sense/ would this be feasible?

Paul_Hossler
11-08-2019, 09:53 AM
Maybe ....



Option Explicit

Sub NotRegEx()
Dim oSlide As Slide
For Each oSlide In ActivePresentation.Slides
If Not oSlide.HasNotesPage Then GoTo NextSlide
With oSlide.NotesPage.Shapes(2)
If Not .HasTextFrame Then GoTo NextSlide
If Not .TextFrame.HasText Then GoTo NextSlide
If Len(.TextFrame.TextRange.Text) = 0 Then GoTo NextSlide
.TextFrame.TextRange.Text = DeleteTags(.TextFrame.TextRange.Text, "code1")
End With
NextSlide:
Next
End Sub

'only tag, function adds <s> and </s>

Function DeleteTags(s As String, t As String) As String
Dim sStart As String, sEnd As String
Dim iStart As Long, iEnd As Long
sStart = "<" & t & ">"
sEnd = "</" & t & ">"
iStart = InStr(1, s, sStart, vbTextCompare)
iEnd = InStr(1, s, sEnd, vbTextCompare)
Do While iStart > 0 And iEnd > 0
'start tag at beginning
If iStart = 1 Then
s = Right(s, Len(s) - iEnd - Len(sEnd) + 1)
Else
s = Left(s, iStart - 1) & Right(s, Len(s) - iEnd - Len(sEnd) + 1)
End If
iStart = InStr(1, s, sStart, vbTextCompare)
iEnd = InStr(1, s, sEnd, vbTextCompare)
Loop
DeleteTags = s
End Function

John Wilson
11-09-2019, 03:42 AM
Nice Code. I was about to post this which does pretty much the same thing.


Sub notRegX2()
Dim oshp As Shape
Dim osld As Slide
Dim otxr2 As TextRange2
Dim lngStart As Long
Dim lngEnd As Long
For Each osld In ActivePresentation.Slides
For Each oshp In osld.NotesPage.Shapes
If oshp.HasTextFrame Then
If oshp.TextFrame2.HasText Then
Set otxr2 = oshp.TextFrame2.TextRange
While InStr(otxr2.Text, "<code1>") > 0
lngStart = InStr(otxr2.Text, "<code1>")
lngEnd = InStr(otxr2.Text, "</code1>") + Len("</code1>")
otxr2.Characters(lngStart, lngEnd - lngStart).Delete
Wend
End If
End If
Next oshp
Next osld
End Sub

majom
11-11-2019, 04:11 AM
Nice, indeed! Thanks Paul and John.

Both solutions work on Mac! This helps me a lot to make the script work for all my colleagues.

Thanks again for your help!