Consulting

Results 1 to 20 of 20

Thread: Help With VBA For A Large Search and Replace Operation, Problem with .MatchWholeWord

  1. #1
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location

    Help With VBA For A Large Search and Replace Operation, Problem with .MatchWholeWord

    Hi guys,

    I'm new to this forum and have attempted to do a search for this information but can't find anything quite relevant - so I apologise if it's been posted before!

    Basically I have a hundred or so documents at work, each about 50 pages which need to be anonymised - removing references to he/she, dates, names, places etc. so that they effectively can't be traced back to who they are referring to.

    I did 50 or so manually and decided that was a waste of time, 100 more to go and it's time to make a macro to do it for me. So far I've modified someone else's 'find and highlight' code to find and replace a list of names with [REMOVED] in red, this has worked to an extent. The list of names is about 10,000 long and is in Word format, on name per line e.g.

    John
    Bill
    Rob
    James

    and so on. Obviously I only want it to find the whole and exact word and replace that, for example I don't want the name 'Mee' to cause the word 'Meeting' to become '[REMOVED]ting'. It does this quite well for the first few entries on the list and then after that starts to pick them out in a string of characters. I'm totally confused by this as I can't work out what changes after 30 or so lines in the list!

    Code is as follows:

    [VBA]

    Sub Anonymouse()
    Dim sCheckDoc As String
    Dim docRef As Document
    Dim docCurrent As Document
    Dim wrdRef As String
    Dim wrdPara As Paragraph


    sCheckDoc = "d:\checklist.doc"
    Set docCurrent = Selection.Document
    Set docRef = Documents.Open(sCheckDoc)
    docCurrent.Activate

    With Selection.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[REMOVED]"
    .Forward = True
    .Format = True
    .MatchWholeWord = True
    .MatchCase = True
    .MatchWildcards = False
    End With

    For Each wrdPara In docRef.Paragraphs
    wrdRef = wrdPara.Range.Text
    ' remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    If Asc(Left(wrdRef, 1)) > 32 Then
    With Selection.Find
    .Wrap = wdFindContinue
    .Text = wrdRef
    .Execute Replace:=wdReplaceAll
    End With
    End If
    Next wrdPara

    docRef.Close
    docCurrent.Activate
    End Sub
    [/VBA]

    Any help would be massively appreciated.

    Cheers guys.

  2. #2
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    [vba]Option Explicit
    Function ReplaceWord(inDoc As Document, j As Long) As String
    Dim wrdPara As Paragraph
    Dim wrdRef As String

    wrdPara = inDoc.Paragraphs(j)
    wrdRef = wrdPara.Range.Text
    ' remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    ReplaceWord = wrdRef
    End Function

    Sub Anonymouse()
    Dim docCurrent As Document
    Dim docRef As Document
    Dim j As Long
    Dim r As Range
    Dim wrdRef As String

    Set docCurrent = ActiveDocument
    Set docRef = Documents.Open("d:\checklist.doc")

    For j = 1 To docRef.Paragraphs.Count
    Set r = docCurrent.Range
    ' gets the next word from reference doc
    wrdRef = ReplaceWord(docRef, j)
    With r.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[REMOVED]"
    .MatchWholeWord = True
    .MatchCase = True
    Do While .Execute(Findtext:=wrdRef, _
    Forward:=True) = True
    If Asc(r.Text) > 32 Then
    r.Text = wrdRef
    r.collapse 0
    Loop
    End With
    Next
    docRef.Close
    Set docRef = Nothing
    End Sub
    [/vba]You do not need to activate anything. You are setting the reference document as a document object. You can action it whether it is activae, or not. The same for the docCurrent object.

    So the code above gets the reference doc, sets it as an object.

    The getting each paragraph as a string is a Function, with the reference document passed in as a parameter, and a counter - the Paragraph count of the reference document.[vba]
    For j = 1 To docRef.Paragraphs.Count

    [/vba]NOTE: Important!!!! This assumes the reference document is a list of paragraphs with no "empty" paragraphs.

    So, for each paragraph in the reference document, the Find gets the next word from the reference doc, strips the paragraph mark, and returns the clean word to the current iteration of the Find.

    If Asc(r.text) - the current .Found - is not 32, replace the .Found with the current word from the reference doc.

  3. #3
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location
    Hi Gerry,

    First up thanks very much for the rapid response.

    When pasting your VBA into Word I get an error on running it: Compile error: Loop without Do, which referes to the 'Loop' instruction 6 lines from the bottom.

    Any idea how to get around this or have I done something wrong?

    Cheers.

  4. #4
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,340
    Location
    Seems to me that something like this would work:

    [VBA]Option Explicit
    Function ReplaceWord(inDoc As Document, j As Long) As String
    Dim wrdPara As Paragraph
    Dim wrdRef As String
    Set wrdPara = inDoc.Paragraphs(j)
    wrdRef = wrdPara.Range.Text
    'remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    ReplaceWord = wrdRef
    End Function
    Sub Anonymouse()
    Dim docCurrent As Document
    Dim docRef As Document
    Dim j As Long
    Dim r As Range
    Dim wrdRef As String
    Set docCurrent = ActiveDocument
    Set docRef = Documents.Open("d:\checklist.doc")
    For j = 1 To docRef.Paragraphs.Count
    Set r = docCurrent.Range
    ' gets the next word from reference doc
    wrdRef = ReplaceWord(docRef, j)
    With r.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = wrdRef
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[REMOVED]"
    .MatchWholeWord = True
    .MatchCase = True
    .Execute Replace:=wdReplaceAll
    End With
    Next
    docRef.Close
    Set docRef = Nothing
    End Sub

    [/VBA]
    Greg

    Visit my website: http://gregmaxey.com

  5. #5
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    I am missing the close to ther IF statement.[vba]
    With r.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[REMOVED]"
    .MatchWholeWord = True
    .MatchCase = True
    Do While .Execute(Findtext:=wrdRef, _
    Forward:=True) = True
    If Asc(r.Text) > 32 Then
    r.Text = wrdRef
    r.collapse 0
    Loop
    End With
    [/vba]

    should be:
    [vba]
    With r.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[REMOVED]"
    .MatchWholeWord = True
    .MatchCase = True
    Do While .Execute(Findtext:=wrdRef, _
    Forward:=True) = True
    If Asc(r.Text) > 32 Then
    r.Text = wrdRef
    r.collapse 0
    End if
    Loop
    End With
    [/vba]

  6. #6
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location
    Thanks again guys!

    Gerry - I thought this was the case so I added the End If bit in earlier, unfortunately it brings up another problem which is:

    Compile error:
    Invalid use of property

    Referencing the 4th line of code and can be seen in the attached image.

    As before, any help would be greatly appreciated.

    Cheers.

  7. #7
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Ooops!

    wrdPara is an object

    SET oPara = inDoc.Paragraphs(j)

    My bad.

  8. #8
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Notice that Greg, being better than I, has the correct syntax! He used SET.

    That is what I get for typing in here directly.

  9. #9
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Also, if your objective is to replace all in one swoop, then Greg's code is better, as it does precisely that. Mine actions each individually.

    The difference is that Greg's code does NOT have the additional test of:
    [vba]
    If Asc(r.Text) > 32 Then
    [/vba]
    His replaces all, with no testing the ASCII of the first character. If the situation is that some of the .Found may not pass that test, and some will, then obviously you need the test. In which case, you need to action each individually in order to DO that test.

    That is why I added it.

    Although I have to say that it seems unlikely that you are going to have a word in the reference doc that DOES have Asc(first)character) as < 32. If it does, it seems to me that your reference doc is in error.

  10. #10
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Hi Orge,

    Does your 'doclist' include posessives (eg Bill's)? If not, a simple change to the Find/Replace can avoid ending up with [REMOVED]'s. Taking Gerry's code as an example, you could change:
    With r.Find 
      .ClearFormatting 
      .Replacement.ClearFormatting 
      .Replacement.Font.Color = wdColorRed 
      .Replacement.Text = "[REMOVED]" 
      .MatchWholeWord = True 
      .MatchCase = True 
      Do While .Execute(Findtext:=wrdRef, _ 
        Forward:=True) = True 
        If Asc(r.Text) > 32 Then 
            r.Text = wrdRef 
            r.collapse 0 
        End If 
    Loop 
    End With
    to:
    With r.Find 
      .ClearFormatting 
      .Replacement.ClearFormatting 
      .Replacement.Font.Color = wdColorRed 
      .Replacement.Text = "[REMOVED]" 
      .MatchWholeWord = True 
      .MatchCase = True 
      Do While .Execute(Findtext:=wrdRef & "'s", _ 
        Forward:=True) = True 
        If Asc(r.Text) > 32 Then 
          r.Text = wrdRef 
          r.collapse 0 
        End If 
      Loop 
      Do While .Execute(Findtext:=wrdRef, _ 
        Forward:=True) = True 
        If Asc(r.Text) > 32 Then 
          r.Text = wrdRef 
          r.collapse 0 
        End If 
      Loop 
    End With
    Last edited by macropod; 01-26-2021 at 09:15 PM.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  11. #11
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location
    Quote Originally Posted by fumei
    Ooops!

    wrdPara is an object

    SET oPara = inDoc.Paragraphs(j)

    My bad.
    Gerry - presumably I need to remove the references to wrdPara in other lines as well, so:

    [VBA]Option Explicit
    Function ReplaceWord(inDoc As Document, j As Long) As String
    Dim wrdPara As Paragraph
    Dim wrdRef As String

    wrdPara = inDoc.Paragraphs(j)
    wrdRef = wrdPara.Range.Text
    ' remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    ReplaceWord = wrdRef
    End Function[/VBA]

    becomes:

    [VBA]Option Explicit
    Function ReplaceWord(inDoc As Document, j As Long) As String
    Dim oPara As Paragraph
    Dim wrdRef As String

    SET oPara = inDoc.Paragraphs(j)
    wrdRef = oPara.Range.Text
    ' remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    ReplaceWord = wrdRef
    End Function[/VBA]

    Am I correct? When I try to run this code it hangs Word 2007, which is unfortunate.

    Macropod - my doclist contains straight names only, such as 'James' 'Jim' etc, etc - no possessives, so thankyou for that!

    GMaxey - I seem to be unable to get your code to work properly either, it also hangs the computer. Is this just me or does it work for anybody else?

    Thanks again guys, if I can get this to work it's going to be fantastic.

    Cheers.

  12. #12
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,340
    Location
    Orge,

    It doesn't hang my PC but I am using a very small list in the reference document. You might try with a shorter list. If that works then you might need to split your list into smaller parts.
    Greg

    Visit my website: http://gregmaxey.com

  13. #13
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location
    Quote Originally Posted by gmaxey
    Orge,

    It doesn't hang my PC but I am using a very small list in the reference document. You might try with a shorter list. If that works then you might need to split your list into smaller parts.
    Aha! You are correct, must be the size of my list then. I will try splitting it down into several smaller lists and see if that fixes it.

    Cheers all!

  14. #14
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Crikey. I am really not paying attention.

    Yes, whatever variable name you use, be consistent!

    If you use wrdPara[vba]
    Dim wrdPara As Paragraph
    [/vba]then use wrdPara, in your Set instruction, and where ever you use wrdPara.

    If you use oPara, then use oPara.

    I can't believe I did that.

  15. #15
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location
    Right, she works! List reduced to ~3000 names and people who are definitely going to be encountered has helped. It takes about 7-10 minutes to process on each document.

    [VBA]
    Option Explicit
    Function ReplaceWord(inDoc As Document, j As Long) As String
    Dim wrdPara As Paragraph
    Dim wrdRef As String
    Set wrdPara = inDoc.Paragraphs(j)
    wrdRef = wrdPara.Range.Text
    'remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    ReplaceWord = wrdRef
    End Function
    Sub Names()
    Dim docCurrent As Document
    Dim docRef As Document
    Dim j As Long
    Dim r As Range
    Dim wrdRef As String
    Set docCurrent = ActiveDocument
    Set docRef = Documents.Open("d:\Anonymouse\Checklists\checklist.doc")
    For j = 1 To docRef.Paragraphs.Count
    Set r = docCurrent.Range
    ' gets the next word from reference doc
    wrdRef = ReplaceWord(docRef, j)
    With r.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = wrdRef
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[NAME REMOVED]"
    .MatchWholeWord = True
    .MatchCase = True
    .Execute Replace:=wdReplaceAll
    End With
    Next
    docRef.Close
    Set docRef = Nothing
    End Sub
    [/VBA]

    Only issue at the moment, is I want to bath run it on ~100 documents overnight which I have a macro for, however when I ran it last night it did the first 19 of them and then I got a windows error saying memory was low and it was going to increase my virtual memory size! I'm not sure why it would do this as it should be closing each file after it's finished.

    Still, even if I can run them in batches of 15 that's fantastic compared to the alternative of going through them manually.

    Orge.

  16. #16
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Post your code that is doing th eprocessing.

    Word is famous, unfortunately, for its memory handling (poor at best, terrible at worst), although it is better that it used to be. Word 6 was so bad, that if you did NOTHING - just let the computer sit there - Word would eventually steal ALL the system memory, so much that Wiindows itself would die.

    So. Depending of precisely what you are doing, the code could likely be tweaked enough to handle things.

    "I'm not sure why it would do this as it should be closing each file after it's finished."

    Post your code. Let's see.

  17. #17
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Hi Orge,

    Re:
    It takes about 7-10 minutes to process on each document.
    That really does seem to be an excessive lenght of time. One certain way of speeding things up - and reducing resource issues - would be to read all the strings from the 'checklist' document into an array, then close that document. From then on, you would open each of the documents to be processed and run the Find/Replace loops via the strings stored in the array.

    I also haven't seen anything in the code you've posted so far to indicate whether you're turning off screenupdating while the code runs. That too should increase processing speed.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  18. #18
    VBAX Regular
    Joined
    Nov 2010
    Posts
    7
    Location
    Once again - thanks for the tips.

    This is all of the code processing the checklist:

    [VBA]Option Explicit
    Function ReplaceWord(inDoc As Document, j As Long) As String
    Dim wrdPara As Paragraph
    Dim wrdRef As String
    Set wrdPara = inDoc.Paragraphs(j)
    wrdRef = wrdPara.Range.Text
    'remove the paragraph mark:
    wrdRef = Left(wrdRef, Len(wrdRef) - 1)
    ReplaceWord = wrdRef
    End Function
    Sub Names()
    Dim docCurrent As Document
    Dim docRef As Document
    Dim j As Long
    Dim r As Range
    Dim wrdRef As String
    Set docCurrent = ActiveDocument
    Set docRef = Documents.Open("d:\Anonymouse\Checklists\checklist.doc")
    For j = 1 To docRef.Paragraphs.Count
    Set r = docCurrent.Range
    ' gets the next word from reference doc
    wrdRef = ReplaceWord(docRef, j)
    With r.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = wrdRef
    .Replacement.Font.Color = wdColorRed
    .Replacement.Text = "[NAME REMOVED]"
    .MatchWholeWord = True
    .MatchCase = True
    .Execute Replace:=wdReplaceAll
    End With
    Next
    docRef.Close
    Set docRef = Nothing
    End Sub [/VBA]

    Pretty much exactly what gmaxey wrote earlier with my filename substituted. There are a couple of other sections doing things like removing dates 'he' 'she' 'his' 'hers' etc. but I don't believe any of these are causing an issue, all run instantly. It's this part that reads from the checklist that takes the time.

    Cheers, Orge.

  19. #19
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Hi Orge,

    Try this code:
    Sub Anonymiser()
    Application.ScreenUpdating = False
    Dim FileList As Variant, ChkDoc As Document, TestDoc As Document, FilePath As String, ChkList As String, j As Long
    'Load the strings from the reference doc into a text string to be used as an array.
    Set ChkDoc = Documents.Open("D:\Anonymouse\Checklists\Checklist.doc")
    ChkList = ChkDoc.Range.Text: ChkDoc.Close False: Set ChkDoc = Nothing
    'Get the path to the documents to process
    FilePath = InputBox("Please input the path to the documents to process", "Path to Files", ActiveDocument.Path)
    'Exit if the filepath is empty
    If FilePath = "" Then GoTo Done
    'Ensure the filepath ends with "\"
    If Right(FilePath, 1) <> "\" Then FilePath = FilePath & "\"
    'Get a list of all documents in the target folder
    FileList = Dir(FilePath & "*.doc", vbNormal)
    'Process each found file
    While FileList <> ""
      Set TestDoc = Documents.Open(FilePath & FileList)
      With TestDoc.Range.Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Replacement.Font.Color = wdColorRed
        .Replacement.Text = "[NAME REMOVED]"
        .MatchWholeWord = True
        .MatchCase = True
        'Process each word from the Check List
        For j = 0 To UBound(Split(ChkList, vbCr))
          .Text = Split(ChkList, vbCr)(j) & "'s"
          .Execute Replace:=wdReplaceAll
          .Text = Split(ChkList, vbCr)(j)
          .Execute Replace:=wdReplaceAll
        Next
      End With
      TestDoc.Close True
      FileList = Dir()
    Wend
    'Clean up and exit
    Done:
    Set TestDoc = Nothing
    Application.ScreenUpdating = True
    End Sub
    Notes:
    1. Screen updating is turned off, but you'll probably see some flickering as each new document is loaded.
    2. The checklist document is opened, data gathered, then closed.
    3. You're asked to nominate the folder to process. The code then processes all files in that folder without further intervention.
    4. The Find/Replace loop has been optimised to eliminate the unnecessary resetting of variables on each iteration
    5. Possessive cases are catered for.
    Last edited by macropod; 01-26-2021 at 09:27 PM.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  20. #20
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    I agree, I can not see anything here that would remotely cause processing of a single iteration 7-10 minutes. Mind you, we have no idea how large these are, but still, see strange.

    Let us know how it turns out running macropod's code, as it is optimized well. If it is still taking such a long time, OR you seem to be still having memory issues, something is wrong.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •