PDA

View Full Version : Complicated Find



baset
09-16-2021, 07:51 PM
Hi All

I've 2 Word files, for example, one for the source language (FileA) that was translated into the other Word file (FileB).

I need a macro to do the below jobs:

- Run the Macro to extract all the content of (FileA) as list of words and remove any repeated words to just keep the unique ones.

- Use that Words list extracted from the previous step to search on (FileB) and if found any of that words, highlight it inside (FileB).

The idea is simple, and I need it to make sure that the second translated file (FileB) is fully translated and that the Translator didn't miss translating any words from the source file (FileA).

gmayor
09-16-2021, 09:38 PM
The following will do what you requested. The code assumes that FileA has each word to check on a separate line. The source is read into a collection of unique words and that collection is used to check the document FileB.


Sub Macro1()
'Graham Mayor - https://www.gmayor.com - Last updated - 17 Sep 2021
Const sPath As String = "C:\Path\" 'the folder where the two documents are stored.
Const sSource As String = "FileA.docx"
Const sTarget As String = "FileB.docx"
Dim oCol As Collection
Dim i As Long
Dim oSource As Document, oTarget As Document
Dim oPara As Paragraph
Dim oRng As Range
Set oSource = Documents.Open(sPath & sSource)
Set oTarget = Documents.Open(sPath & sTarget)
Set oCol = New Collection
On Error Resume Next
For i = 1 To oSource.Paragraphs.Count
Set oRng = oSource.Paragraphs(i).Range
oRng.End = oRng.End - 1
oCol.Add Trim(oRng.Text), Trim(oRng.Text)
Next i
On Error GoTo 0
Options.DefaultHighlightColorIndex = wdYellow
For i = 1 To oCol.Count
Set oRng = oTarget.Range
With oRng.Find
.ClearFormatting
.Text = CStr(oCol(i))
.Replacement.ClearFormatting
.Replacement.Text = "^&"
.Replacement.Highlight = True
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
Next i
MsgBox "Check complete!"
lbl_Exit:
oSource.Close 0
Set oSource = Nothing
Set oTarget = Nothing
Set oRng = Nothing
Set oCol = Nothing
Exit Sub
End Sub

baset
09-16-2021, 10:45 PM
Thanks a lot Graham Mayor for this quick reply.

Actually the source word file (FileA) is a normal file that has paragraphs, tables, ...etc. So is it possible to add some codes on the beginning of the macro to extract all its content as separate words to find them later on (FileB)?

Thanks in advance for your help on that.

baset
09-17-2021, 01:50 AM
If it's better, I'm sharing with you the attached sample files.28976 --- 28977

gmayor
09-17-2021, 02:17 AM
It's a lot more complicated with a normal document as a word source, especially without seeing that document (which you posted after I came up with the code), however the following should be close. It will however take some time to run:

P.S. I have checked it against your samples and it appears to work



Sub GetWords()
'Graham Mayor - https://www.gmayor.com - Last updated - 17 Sep 2021
Const sPath As String = "C:\Path\" 'the folder where the two documents are stored.
Const sSource As String = "FileA.docx"
Const sTarget As String = "FileB.docx"
Dim oSource As Document, oTarget As Document
Dim oStory As Range, oRng As Range
Dim oCol As Collection
Dim oWord As Object
Dim sText As String
Dim i As Long
Set oCol = New Collection
Set oSource = Documents.Open(sPath & sSource)
Set oTarget = Documents.Open(sPath & sTarget)

On Error Resume Next
For Each oStory In oSource.StoryRanges
For Each oWord In oStory.Words
sText = Trim(oWord)
If OnlyLetters(sText) = True Then
oCol.Add sText, sText
End If
DoEvents
Next oWord
If oStory.StoryType <> wdMainTextStory Then
While Not (oStory.NextStoryRange Is Nothing)
Set oStory = oStory.NextStoryRange
For Each oWord In oStory.Words
sText = Trim(oWord)
If OnlyLetters(sText) = True Then
oCol.Add sText, sText
End If
DoEvents
Next oWord
Wend
End If
DoEvents
Next oStory
On Error GoTo 0
Options.DefaultHighlightColorIndex = wdYellow
For i = 1 To oCol.Count
Set oRng = oTarget.Range
With oRng.Find
.ClearFormatting
.Text = CStr(oCol(i))
.Replacement.ClearFormatting
.Replacement.Text = "^&"
.Replacement.Highlight = True
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
DoEvents
Next i
MsgBox "Check complete!"
lbl_Exit:
oSource.Close 0
Set oWord = Nothing
Set oSource = Nothing
Set oTarget = Nothing
Set oRng = Nothing
Set oCol = Nothing
Exit Sub
End Sub

Private Function OnlyLetters(sWord As String) As Boolean
Dim intPos As Integer
For intPos = 1 To Len(sWord)
Select Case Asc(Mid(sWord, intPos, 1))
Case 65 To 90, 97 To 122
OnlyLetters = True
Case Else
OnlyLetters = False
Exit For
End Select
Next
lbl_Exit:
Exit Function
End Function

baset
09-17-2021, 03:27 AM
Dear Graham

That's fantastic, I tested it and it works with the below notes:

- It didn't detect this word Plaintiff’s from the source file (FileA), I mean that it's found also on (FileB) but it's not highlighted.

- This great macro didn't run or work on the text inside the text box which is located under the table.

Is there a solution for that 2 issues?

gmayor
09-17-2021, 06:52 AM
Words with apostrophes can be accommodated easily enough by adding the apostrophe characters to the OnlyGetLetters function.
I have added unique words to the collection, but PLAINTIFF and plaintiff are considered to be the same word, so the practical solution is to change the search to ignore case.
Text boxes are frankly a pain as they are essentially graphics and not in the text story ranges. The following will also handle text boxes.
There are lots of instances of the letter 'a' highlighted. I have eliminated them with the two entries

If OnlyLetters(sText) = True And Len(sText) > 1 Then
If you want the 'a' instances highlighted remove the following bit from those two lines.
And Len(sText) > 1

You can find my web site at https://www.gmayor.com/Word_pages.htm if you wish to demonstrate your appreciation for the work that went into this.


Sub GetWords()
'Graham Mayor - https://www.gmayor.com - Last updated - 17 Sep 2021
Const sPath As String = "C:\Path\" 'the folder where the two documents are stored.
Const sSource As String = "FileA.docx"
Const sTarget As String = "FileB.docx"
Dim oSource As Document, oTarget As Document
Dim oStory As Range, oRng As Range
Dim oShp As Shape
Dim oCol As Collection
Dim oWord As Object
Dim sText As String
Dim i As Long
Set oCol = New Collection
Set oSource = Documents.Open(sPath & sSource)
Set oTarget = Documents.Open(sPath & sTarget)

On Error Resume Next

For Each oStory In oSource.StoryRanges
Select Case oStory.StoryType
Case 1 To 11
Do
For Each oWord In oStory.Words
sText = Trim(oWord)
If OnlyLetters(sText) = True And Len(sText) > 1 Then
oCol.Add sText, sText
End If
DoEvents
Next oWord
DoEvents

Select Case oStory.StoryType
Case 6, 7, 8, 9, 10, 11
If oStory.ShapeRange.Count > 0 Then
For Each oShp In oStory.ShapeRange
If oShp.TextFrame.HasText Then
For Each oWord In oShp.TextFrame.TextRange.Words
sText = Trim(oWord)
If OnlyLetters(sText) = True And Len(sText) > 1 Then
oCol.Add sText, sText
End If
DoEvents
Next oWord
End If
DoEvents
Next oShp
End If
Case Else
'Do Nothing
End Select
'Get next linked story (if any)
Set oStory = oStory.NextStoryRange
Loop Until oStory Is Nothing
Case Else
End Select
DoEvents
Next oStory
Options.DefaultHighlightColorIndex = wdYellow
For i = 1 To oCol.Count
Debug.Print oCol(i)
For Each oStory In oTarget.StoryRanges
Select Case oStory.StoryType
Case 1 To 11
Do
With oStory.Find
.ClearFormatting
.Text = CStr(oCol(i))
.Replacement.ClearFormatting
.Replacement.Text = "^&"
.Replacement.Highlight = True
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
DoEvents

Select Case oStory.StoryType
Case 6, 7, 8, 9, 10, 11
If oStory.ShapeRange.Count > 0 Then
For Each oShp In oStory.ShapeRange
If oShp.TextFrame.HasText Then
Set oRng = oShp.TextFrame.TextRange
With oRng.Find
.ClearFormatting
.Text = CStr(oCol(i))
.Replacement.ClearFormatting
.Replacement.Text = "^&"
.Replacement.Highlight = True
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = True
.MatchWholeWord = True
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
End If
DoEvents
Next oShp
End If
Case Else
'Do Nothing
End Select
'Get next linked story (if any)
Set oStory = oStory.NextStoryRange
Loop Until oStory Is Nothing
Case Else
End Select
DoEvents
Next oStory
Next i
MsgBox "Check complete!"
lbl_Exit:
oSource.Close 0
Set oWord = Nothing
Set oSource = Nothing
Set oTarget = Nothing
Set oRng = Nothing
Set oCol = Nothing
Exit Sub
End Sub

Private Function OnlyLetters(sWord As String) As Boolean
Dim intPos As Integer
For intPos = 1 To Len(sWord)
Select Case Asc(Mid(sWord, intPos, 1))
Case 65 To 90, 97 To 122, 146, 180
OnlyLetters = True
Case Else
OnlyLetters = False
Exit For
End Select
Next
lbl_Exit:
Exit Function
End Function

baset
09-17-2021, 08:23 PM
Dear Graham

Thanks a lot for this effort you did for me, you really helped me a lot.

I visited your website, but I don't know how to demonstrate my appreciation for the work that went into this, Could you please let me know how could I do that?

gmayor
09-17-2021, 10:58 PM
There are donation links on every page :)

baset
09-19-2021, 09:36 PM
Okay, I did the donation as shown below:

https://ibb.co/0Z58BL0
https://ibb.co/6NzLRD8


Thanks for your help.

baset
10-01-2021, 10:58 AM
Hi Graham,

Can I ask you for something on that Macro, if the target file has RTL (right to left) language, I noticed that the highlighted words are flipped, Is this issue can be solved?

I attached here both the sample source and target files, along with a screenshot for the issue.

baset
10-05-2021, 12:55 AM
Hi Graham,

Do you have any clue for the above reported issue?

gmayor
10-05-2021, 04:40 AM
I regret I have no experience with right to left languages.

baset
10-05-2021, 05:52 AM
No Worries - Thanks for your reply :hi:.

baset
12-10-2022, 03:16 PM
Hi Graham,

I need a very small favor from you, I need a progress bar for that Macro you did for me, Could you please help me on that?

Thanks in advance