View Full Version : Complicated Find
baset
09-16-2021, 07:51 PM
Hi All
I've 2 Word files, for example, one for the source language (FileA) that was translated into the other Word file (FileB).
I need a macro to do the below jobs:
- Run the Macro to extract all the content of (FileA) as list of words and remove any repeated words to just keep the unique ones.
- Use that Words list extracted from the previous step to search on (FileB) and if found any of that words, highlight it inside (FileB).
The idea is simple, and I need it to make sure that the second translated file (FileB) is fully translated and that the Translator didn't miss translating any words from the source file (FileA).
gmayor
09-16-2021, 09:38 PM
The following will do what you requested. The code assumes that FileA has each word to check on a separate line. The source is read into a collection of unique words and that collection is used to check the document FileB.
Sub Macro1()
'Graham Mayor - https://www.gmayor.com - Last updated - 17 Sep 2021
Const sPath As String = "C:\Path\" 'the folder where the two documents are stored.
Const sSource As String = "FileA.docx"
Const sTarget As String = "FileB.docx"
Dim oCol As Collection
Dim i As Long
Dim oSource As Document, oTarget As Document
Dim oPara As Paragraph
Dim oRng As Range
    Set oSource = Documents.Open(sPath & sSource)
    Set oTarget = Documents.Open(sPath & sTarget)
    Set oCol = New Collection
    On Error Resume Next
    For i = 1 To oSource.Paragraphs.Count
        Set oRng = oSource.Paragraphs(i).Range
        oRng.End = oRng.End - 1
        oCol.Add Trim(oRng.Text), Trim(oRng.Text)
    Next i
    On Error GoTo 0
    Options.DefaultHighlightColorIndex = wdYellow
    For i = 1 To oCol.Count
        Set oRng = oTarget.Range
        With oRng.Find
            .ClearFormatting
            .Text = CStr(oCol(i))
            .Replacement.ClearFormatting
            .Replacement.Text = "^&"
            .Replacement.Highlight = True
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
            .MatchCase = True
            .MatchWholeWord = True
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
            .Execute Replace:=wdReplaceAll
        End With
    Next i
    MsgBox "Check complete!"
lbl_Exit:
    oSource.Close 0
    Set oSource = Nothing
    Set oTarget = Nothing
    Set oRng = Nothing
    Set oCol = Nothing
    Exit Sub
End Sub
baset
09-16-2021, 10:45 PM
Thanks a lot Graham Mayor for this quick reply.
Actually the source word file (FileA) is a normal file that has paragraphs, tables, ...etc. So is it possible to add some codes on the beginning of the macro to extract all its content as separate words to find  them later on (FileB)?
Thanks in advance for your help on that.
baset
09-17-2021, 01:50 AM
If it's better, I'm sharing with you the attached sample files.28976 --- 28977
gmayor
09-17-2021, 02:17 AM
It's a lot more complicated with a normal document as a word source, especially without seeing that document (which you posted after I came up with the code), however the following should be close. It will however take some time to run:
P.S. I have checked it against your samples and it appears to work
Sub GetWords()
'Graham Mayor - https://www.gmayor.com - Last updated - 17 Sep 2021
Const sPath As String = "C:\Path\"    'the folder where the two documents are stored.
Const sSource As String = "FileA.docx"
Const sTarget As String = "FileB.docx"
Dim oSource As Document, oTarget As Document
Dim oStory As Range, oRng As Range
Dim oCol As Collection
Dim oWord As Object
Dim sText As String
Dim i As Long
    Set oCol = New Collection
    Set oSource = Documents.Open(sPath & sSource)
    Set oTarget = Documents.Open(sPath & sTarget)
    On Error Resume Next
    For Each oStory In oSource.StoryRanges
        For Each oWord In oStory.Words
            sText = Trim(oWord)
            If OnlyLetters(sText) = True Then
                oCol.Add sText, sText
            End If
            DoEvents
        Next oWord
        If oStory.StoryType <> wdMainTextStory Then
            While Not (oStory.NextStoryRange Is Nothing)
                Set oStory = oStory.NextStoryRange
                For Each oWord In oStory.Words
                    sText = Trim(oWord)
                    If OnlyLetters(sText) = True Then
                        oCol.Add sText, sText
                    End If
                    DoEvents
                Next oWord
            Wend
        End If
        DoEvents
    Next oStory
    On Error GoTo 0
    Options.DefaultHighlightColorIndex = wdYellow
    For i = 1 To oCol.Count
        Set oRng = oTarget.Range
        With oRng.Find
            .ClearFormatting
            .Text = CStr(oCol(i))
            .Replacement.ClearFormatting
            .Replacement.Text = "^&"
            .Replacement.Highlight = True
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
            .MatchCase = True
            .MatchWholeWord = True
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
            .Execute Replace:=wdReplaceAll
        End With
        DoEvents
    Next i
    MsgBox "Check complete!"
lbl_Exit:
    oSource.Close 0
    Set oWord = Nothing
    Set oSource = Nothing
    Set oTarget = Nothing
    Set oRng = Nothing
    Set oCol = Nothing
    Exit Sub
End Sub
Private Function OnlyLetters(sWord As String) As Boolean
Dim intPos As Integer
    For intPos = 1 To Len(sWord)
        Select Case Asc(Mid(sWord, intPos, 1))
            Case 65 To 90, 97 To 122
                OnlyLetters = True
            Case Else
                OnlyLetters = False
                Exit For
        End Select
    Next
lbl_Exit:
    Exit Function
End Function
baset
09-17-2021, 03:27 AM
Dear Graham
That's fantastic, I tested it and it works with the below notes:
- It didn't detect this word Plaintiff’s from the source file (FileA), I mean that it's found also on (FileB) but it's not highlighted.
- This great macro didn't run or work on the text inside the text box which is located under the table.
Is there a solution for that 2 issues?
gmayor
09-17-2021, 06:52 AM
Words with apostrophes can be accommodated easily enough by adding the apostrophe characters to the OnlyGetLetters function.
I have added unique words to the collection, but PLAINTIFF and plaintiff are considered to be the same word, so the practical solution is to change the search to ignore case.
Text boxes are frankly a pain as they are essentially graphics and not in the text story ranges. The following will also handle text boxes.
There are lots of instances of the letter 'a' highlighted. I have eliminated them with the two entries 
If OnlyLetters(sText) = True And Len(sText) > 1 Then
 If you want the 'a' instances highlighted remove the following bit from those two lines.
And Len(sText) > 1
You can find my web site at https://www.gmayor.com/Word_pages.htm if you wish to demonstrate your appreciation for the work that went into this.
Sub GetWords()
'Graham Mayor - https://www.gmayor.com - Last updated - 17 Sep 2021
Const sPath As String = "C:\Path\"    'the folder where the two documents are stored.
Const sSource As String = "FileA.docx"
Const sTarget As String = "FileB.docx"
Dim oSource As Document, oTarget As Document
Dim oStory As Range, oRng As Range
Dim oShp As Shape
Dim oCol As Collection
Dim oWord As Object
Dim sText As String
Dim i As Long
    Set oCol = New Collection
    Set oSource = Documents.Open(sPath & sSource)
    Set oTarget = Documents.Open(sPath & sTarget)
    On Error Resume Next
    For Each oStory In oSource.StoryRanges
        Select Case oStory.StoryType
            Case 1 To 11
                Do
                    For Each oWord In oStory.Words
                        sText = Trim(oWord)
                        If OnlyLetters(sText) = True And Len(sText) > 1 Then
                            oCol.Add sText, sText
                        End If
                        DoEvents
                    Next oWord
                    DoEvents
                    Select Case oStory.StoryType
                        Case 6, 7, 8, 9, 10, 11
                            If oStory.ShapeRange.Count > 0 Then
                                For Each oShp In oStory.ShapeRange
                                    If oShp.TextFrame.HasText Then
                                        For Each oWord In oShp.TextFrame.TextRange.Words
                                            sText = Trim(oWord)
                                            If OnlyLetters(sText) = True And Len(sText) > 1 Then
                                                oCol.Add sText, sText
                                            End If
                                            DoEvents
                                        Next oWord
                                    End If
                                    DoEvents
                                Next oShp
                            End If
                        Case Else
                            'Do Nothing
                    End Select
                    'Get next linked story (if any)
                    Set oStory = oStory.NextStoryRange
                Loop Until oStory Is Nothing
            Case Else
        End Select
        DoEvents
    Next oStory
    Options.DefaultHighlightColorIndex = wdYellow
    For i = 1 To oCol.Count
Debug.Print oCol(i)
        For Each oStory In oTarget.StoryRanges
            Select Case oStory.StoryType
                Case 1 To 11
                    Do
                        With oStory.Find
                            .ClearFormatting
                            .Text = CStr(oCol(i))
                            .Replacement.ClearFormatting
                            .Replacement.Text = "^&"
                            .Replacement.Highlight = True
                            .Forward = True
                            .Wrap = wdFindContinue
                            .Format = True
                            .MatchCase = False
                            .MatchWholeWord = True
                            .MatchWildcards = False
                            .MatchSoundsLike = False
                            .MatchAllWordForms = False
                            .Execute Replace:=wdReplaceAll
                        End With
                        DoEvents
                        Select Case oStory.StoryType
                            Case 6, 7, 8, 9, 10, 11
                                If oStory.ShapeRange.Count > 0 Then
                                    For Each oShp In oStory.ShapeRange
                                        If oShp.TextFrame.HasText Then
                                            Set oRng = oShp.TextFrame.TextRange
                                            With oRng.Find
                                                .ClearFormatting
                                                .Text = CStr(oCol(i))
                                                .Replacement.ClearFormatting
                                                .Replacement.Text = "^&"
                                                .Replacement.Highlight = True
                                                .Forward = True
                                                .Wrap = wdFindContinue
                                                .Format = True
                                                .MatchCase = True
                                                .MatchWholeWord = True
                                                .MatchWildcards = False
                                                .MatchSoundsLike = False
                                                .MatchAllWordForms = False
                                                .Execute Replace:=wdReplaceAll
                                            End With
                                        End If
                                        DoEvents
                                    Next oShp
                                End If
                            Case Else
                                'Do Nothing
                        End Select
                        'Get next linked story (if any)
                        Set oStory = oStory.NextStoryRange
                    Loop Until oStory Is Nothing
                Case Else
            End Select
            DoEvents
        Next oStory
    Next i
    MsgBox "Check complete!"
lbl_Exit:
    oSource.Close 0
    Set oWord = Nothing
    Set oSource = Nothing
    Set oTarget = Nothing
    Set oRng = Nothing
    Set oCol = Nothing
    Exit Sub
End Sub
Private Function OnlyLetters(sWord As String) As Boolean
Dim intPos As Integer
    For intPos = 1 To Len(sWord)
        Select Case Asc(Mid(sWord, intPos, 1))
            Case 65 To 90, 97 To 122, 146, 180
                OnlyLetters = True
            Case Else
                OnlyLetters = False
                Exit For
        End Select
    Next
lbl_Exit:
    Exit Function
End Function
baset
09-17-2021, 08:23 PM
Dear Graham
Thanks a lot for this effort you did for me, you really helped me a lot.
I visited your website, but I don't know how to demonstrate my appreciation for the work that went into this, Could you please let me know how could I do that?
gmayor
09-17-2021, 10:58 PM
There are donation links on every page :)
baset
09-19-2021, 09:36 PM
Okay, I did the donation as shown below:
https://ibb.co/0Z58BL0
https://ibb.co/6NzLRD8
Thanks for your help.
baset
10-01-2021, 10:58 AM
Hi Graham,
Can I ask you for something on that Macro, if the target file has RTL (right to left) language, I noticed that the highlighted words are flipped, Is this issue can be solved?
I attached here both the sample source and target files, along with a screenshot for the issue.
baset
10-05-2021, 12:55 AM
Hi Graham,
Do you have any clue for the above reported issue?
gmayor
10-05-2021, 04:40 AM
I regret I have no experience with right to left languages.
baset
10-05-2021, 05:52 AM
No Worries - Thanks for your reply :hi:.
baset
12-10-2022, 03:16 PM
Hi Graham,
I need a very small favor from you, I need a progress bar for that Macro you did for me, Could you please help me on that?
Thanks in advance
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.