View Full Version : Highlight not delete repeated phrases and sentences in a word document
mpeterson
05-23-2017, 10:35 AM
Hi All,
I am working on a large word document and need to detect and highlight repeated phrases that have two or more words (i.e. part of a sentence) up to a sentence. I ran into this marvelous code at stackoverflow site, which detects and highlights repeated sentences only, and ignores any repeated phrases within sentences.
Could this code be modified to also detect and highlight phrases of two or more words?
Many thanks in advance.
https://stackoverflow.com/questions/10301009/highlight-not-delete-repeat-sentences-or-phrases-in-a-word-document
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
gmaxey
05-23-2017, 11:20 AM
That code is not robust. Consider Mr. Smith loves Mary. Mr. Jones loves Mary too.
There is no repeated sentence.
gmaxey
05-23-2017, 11:36 AM
And for similar reasons neither is this:
Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oCol As New Collection
Dim oColDups As New Collection
Dim lngINdex As Long
Dim oWord, itm
For lngINdex = 1 To ActiveDocument.Words.Count - 1
On Error Resume Next
oCol.Add Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1)), _
Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1))
If Err.Number <> 0 Then
Err.Clear
On Error Resume Next
oColDups.Add Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1)), _
Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1))
On Error GoTo 0
End If
Next
For Each itm In oColDups
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
lbl_Exit:
Exit Sub
End Sub
mpeterson
05-23-2017, 11:58 AM
Thank you very much Greg for your precious input. In fact, I am really after "loves Mary" as it is repeated in both sentences. Is this possible?
My best regards to your esteemed person.
gmaxey
05-23-2017, 12:06 PM
It is 'sort of' and by defining exceptions. The problem is that Word doesn't count a sentence or a word like you or I do:
Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oCol As New Collection
Dim oColDups As New Collection
Dim lngINdex As Long
Dim oWord, itm
For lngINdex = 1 To ActiveDocument.Words.Count - 1
On Error Resume Next
oCol.Add Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1)), _
Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1))
If Err.Number <> 0 Then
Err.Clear
On Error Resume Next
oColDups.Add Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1)), _
Trim(ActiveDocument.Words(lngINdex) & ActiveDocument.Words(lngINdex + 1))
On Error GoTo 0
End If
Next
For Each itm In oColDups
If Not itm = "Mr." Then
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
End If
Next
lbl_Exit:
Exit Sub
End Sub
mpeterson
05-23-2017, 12:26 PM
After running this code, I understand now what you meant by "sort of".
In fact, in my search for the proper code before starting this thread I ran into one of your codes which I found on this link: http://www.vbaexpress.com/forum/showthread.php?33793-vba-script-find-and-highlight-multi-word-phrases
I think this code would've worked for me if the array was fed from another word file with a list of words and phrases that would be used instead of "I think that" and "very".
In other words, two files, one with the text to be parsed and the other with the list of words and phrases to be used for parsing. How do you see this solution, Greg?
Sub ScratchMacro() Dim oRng As Word.Range
Dim arrWords
Dim i As Long
arrWords = Array("I think that", "very")
For i = 0 To UBound(arrWords)
Set oRng = ActiveDocument.Range
With oRng.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = arrWords(i)
.MatchWholeWord = True
.Replacement.Highlight = True
.Execute Replace:=wdReplaceAll
End With
Next End Sub
mpeterson
05-23-2017, 01:30 PM
After thinking it over, I do see what you see Greg, there's no "right" solution for this problem. So I think I should drop this question from discussion, thanking you very much for your expert cooperation and advice.
Always .. all the best ..
gmaxey
05-23-2017, 05:10 PM
I was out for a couple of hours. Yes, if you know the word pairs or phrases you want to find and highlight e.g., "loves Mary", "Everybody loves Mary" etc, then that could be done.
macropod
05-23-2017, 09:11 PM
You might be interested in: http://www.eileenslounge.com/viewtopic.php?f=30&t=26488#p205614
gmaxey
05-24-2017, 08:35 AM
Paul,
I doubt if any of us will ever come up with something iron-clad but leveraging the sentences method where it does work and ranges makes it a lot faster:
Option Explicit
Private oDict
Sub ExtractFirstDeducedParagraphSentence()
'A basic Word macro coded by Greg Maxey.
'Also returns count of total deduced sentences.
Dim arrAbbr() As String
Dim oPar As Paragraph
Dim oRng As Range
Dim lngIndex As Long, lngParSent As Long, lngSent As Long
Dim oDoc As Document
Dim oCol As New Collection
'Use to detect initials and abbreviations.
Set oDict = CreateObject("Scripting.Dictionary")
'Adjust to suit requirements.
arrAbbr = Split("Ave|BGen|Br|Capt|Col|Dr|Drs|Esq|Ens|Fr|Gen|Hon|Jr|Lt|LtCol|LtGen|Ltjg|Maj|M ajGen|" _
& "Messrs|Mlle|Mme|Mmes|Msgr|Mr|Mrs|Ms|Mt|Ph.D|Prof|Rep|Rev|Sen|Sr|yrs", "|")
On Error Resume Next
For lngIndex = 0 To UBound(arrAbbr)
oDict.Add arrAbbr(lngIndex), arrAbbr(lngIndex)
Next lngIndex
On Error GoTo 0
'Add UCase Alpha Characters A-Z
For lngIndex = 65 To 90
oDict.Add CStr(Chr(lngIndex)), Chr(lngIndex)
Next
'Parse paragraphs using "what does work" with the sentences method.
For Each oPar In ActiveDocument.Range.Paragraphs
lngParSent = 0
Set oRng = oPar.Range
oRng.Collapse wdCollapseStart
Do
'Extend range to where Word thinks a sentence terminates. This will
oRng.MoveEnd wdSentence, 1
oRng.Select
oRng.End = oRng.End - 1
oRng.Select
Do While Not fcnValidTerminus(oRng)
If oRng.End = oPar.Range.End - 1 Then Exit Do
oRng.End = oRng.End + 1
oRng.MoveEnd wdSentence, 1
oRng.End = oRng.End - 1
Loop
If oRng.Characters.Last Like "[.?!]" Then
lngParSent = lngParSent + 1
MoveRangeWhileWhiteSpace oRng
If lngParSent = 1 Then oCol.Add oRng.Duplicate
lngSent = lngSent + 1
End If
If oRng.End = ActiveDocument.Range.End - 1 Then Exit For
oRng.Collapse wdCollapseEnd
MoveRangeToWhiteSpace oRng, False
oRng.Collapse wdCollapseEnd
Loop Until oRng.End = oPar.Range.End
Next
Set oDoc = Documents.Add
Set oRng = oDoc.Range
For lngIndex = 1 To oCol.Count
oRng.FormattedText = oCol.Item(lngIndex)
oRng.Collapse wdCollapseEnd
oRng.InsertBefore vbCr
oRng.Collapse wdCollapseEnd
Next lngIndex
MsgBox oCol.Count & " first sentences of " & lngSent & " document sentences were extracted."
lbl_Exit:
Set oDict = Nothing
Exit Sub
End Sub
Function fcnValidTerminus(ByVal oRng) As Boolean
Dim oRngWord As Range
fcnValidTerminus = True
If oRng.Characters.Last = "," Then fcnValidTerminus = False: Exit Function
If Not oRng.Characters.Last Like "[.?!]" Then fcnValidTerminus = False: Exit Function
If oRng.Words.Last Like ".[" & Chr(32) & Chr(160) & "]" Then
Set oRngWord = oRng.Words.Last.Previous
MoveRangeToWhiteSpace oRngWord
On Error Resume Next
oDict.Add CStr(oRngWord), oRngWord
If Err.Number <> 0 Then
fcnValidTerminus = False
Else
oDict.Remove CStr(oRngWord)
End If
Exit Function
End If
lbl_Exit:
Exit Function
End Function
Sub MoveRangeToWhiteSpace(ByVal oRng, Optional bBack As Boolean = True)
If bBack Then
If Not oRng.Characters.First.Start = 0 Then
If Not fcnIsWhiteSpace(oRng.Characters.First.Previous) Then
Do
oRng.Start = oRng.Start - 1
If oRng.Start = 0 Then Exit Do
Loop Until fcnIsWhiteSpace(oRng.Characters.First.Previous)
End If
End If
Else
If Not oRng.Characters.Last.End = ActiveDocument.Range.End Then
If fcnIsWhiteSpace(oRng.Characters.Last) Then
Do
oRng.End = oRng.End + 1
Loop Until Not fcnIsWhiteSpace(oRng.Characters.Last.Next)
End If
End If
End If
lbl_Exit:
Exit Sub
End Sub
Sub MoveRangeWhileWhiteSpace(ByVal oRng)
If fcnIsWhiteSpace(oRng.Characters.Last) Then
Do
oRng.End = oRng.End - 1
Loop Until Not fcnIsWhiteSpace(oRng.Characters.Last)
End If
lbl_Exit:
Exit Sub
End Sub
Private Function fcnIsWhiteSpace(oRngEndCharacter As Range) As Boolean
If Not oRngEndCharacter Is Nothing Then
Select Case Asc(oRngEndCharacter.Text)
Case 32, 160, 9, 11, 13: fcnIsWhiteSpace = True
Case Else: fcnIsWhiteSpace = False
End Select
End If
lbl_Exit:
Exit Function
End Function
macropod
05-24-2017, 03:50 PM
I doubt if any of us will ever come up with something iron-clad
Agreed. I'll leave it to the masochists to try :razz:
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.