Results 1 to 4 of 4

Thread: Please help me with these code (macro to find repeated words)

  1. #1
    VBAX Newbie
    Joined
    Oct 2016
    Posts
    3
    Location

    Please help me with these code (macro to find repeated words)

    I'm writing this macro to find repeated words in text. My goal is to find them, then highlight all the occurrences and then underline the nearest repetitions (10 words, 20 words etc). Something like 1 to 10 - heavy double line, 11 to 50, double line, etc.).

    It works until highlighting the words, but there is some trouble with calculating distances. Please check my code and tell me what is failing and how can I fix it, or if I need another approach.

    Sub FindWords()
        Const maxwords = 9000                               'Maximum words allowed
            
        Dim SingleWord As String                           'Raw word pulled from doc
        
        Dim Words(maxwords) As String                           'It holds unique words
        Dim Freq(maxwords) As Integer                           'Frequences of the words
    
        Dim RWords(maxwords) As String                           'It holds repeated words
        Dim RWordsPositions(maxwords) As Integer              'It holds the position of those words
    
        Dim WordNum As Integer
        Dim rwnum As Integer
        Dim worddistance As Integer
        Dim countWord As Integer
    
        Dim thisWord As String                 'Current word to search
        Dim ttlwds As Long                      'Total words in the document
        Dim Excludes As String                  'Words to be excluded
        Dim Includes As String                  'To solve the inclusion of foreign characters
        Dim Found As Boolean                    'Temporary flag
        Dim j, k As Integer                     'Temporary variables
    
     
        ' Set up excluded words
        'Excludes = "[a][an][and][as][at][for][from][he][her][his][in][of][on][she][the][to][was][with]"
        Excludes = "[a][al][as][con][de][del][el][en][es][lo][los][la][las][le][me][mi][no][ni][o][por][que][qué][se][si][sí][sin][su][sus][te][tu][un][uno][una][y][ya]"
        Includes = "[á][é][í][ó][ú][ñ]"
     
        
        Selection.HomeKey Unit:=wdStory
        System.Cursor = wdCursorWait
        ttlwds = ActiveDocument.Words.Count
    
        ' Control the repeat
        
        rwnum = 0
        countWord = 0
        
        
        
        For Each aWord In ActiveDocument.Words
            
            SingleWord = Trim(LCase(aWord))
            'Out of range?
            
            If InStr(Includes, Left(SingleWord, 1)) Then
             ElseIf SingleWord < "a" Or SingleWord > "z" Then
                 SingleWord = ""
            End If
    
            'On exclude list?
            If InStr(Excludes, "[" & SingleWord & "]") Then
                SingleWord = ""
                countWord = countWord + 1
            End If
                    
            If Len(SingleWord) > 0 Then
                countWord = countWord + 1
                Found = False
                For j = 1 To WordNum
                    If Words(j) = SingleWord Then
                        Freq(j) = Freq(j) + 1
                        Found = True
                        rwnum = rwnum + 1
                        RWords(rwnum) = SingleWord
                        RWordsPositions(rwnum) = countWord
                        'Debug.Print Str(rwnum) + "-" + SingleWord + " - " + Str(Freq(j)) + " " + RWords(countWord) + " " + Str(RWordsPositions(countWord))
                        Exit For
                    End If
                Next j
                If Not Found Then
                    WordNum = WordNum + 1
                    Words(WordNum) = SingleWord
                    Freq(WordNum) = 1
                End If
                If WordNum > maxwords - 1 Then
                    j = MsgBox("Too many words.", vbOKOnly)
                    Exit For
                End If
                        
            End If
            ttlwds = ttlwds - 1
            StatusBar = "Remaining: " & ttlwds
        Next aWord
        
        'For k = 1 To rwnum
         '   Debug.Print Str(RWordsPositions(k)) + " - "; RWords(k)
        'Next k
        
        
        'COMPARES AND CHECKS DISTANCES
        
        countWord = 0
        
        For Each aWord In ActiveDocument.Words
            
            thisWord = Trim(LCase(aWord))
            'Out of range?
            If InStr(Includes, Left(thisWord, 1)) Then
             ElseIf thisWord < "a" Or thisWord > "z" Then
                 countWord = countWord + 1
                thisWord = ""
            End If
            'On exclude list?
            If InStr(Excludes, "[" & thisWord & "]") Then
                thisWord = ""
                countWord = countWord + 1
            End If
            
            If Len(thisWord) > 0 Then
                countWord = countWord + 1
                           
                   For j = 1 To rwnum
                   If thisWord = RWords(j) Then
                       
                        ActiveDocument.Words(countWord).Select
                        Selection.Expand Unit:=wdWord
                        If Selection.Characters(Selection.Characters.Count) = " " Then
                            Selection.MoveEnd Unit:=wdCharacter, Count:=-1
                        End If
                        Selection.Range.HighlightColorIndex = wdTurquoise
                                            
                        worddistance = Abs(RWordsPositions(j) - countWord)
                        
                        'HERE IT WOULD BE MORE IF-THEN TO CHECK OTHER DISTANCES
                        If worddistance > -1 And worddistance < 10 Then
                            Selection.Font.Underline = wdUnderlineDouble
                        End If
     
                   End If
                Next j
            End If
        Next aWord
    
        For j = 1 To rwnum
            Debug.Print Str(j) + "- " + RWords(j) + "  pos:" + Str(RWordsPositions(j))
        Next j
    
        System.Cursor = wdCursorNormal
    End Sub
    My language is spanish, that's why I'm inserting the option to include foreign characters.

    Here is some text wich I've tried the code on. If you want to test it with a text in english, please uncomment the corresponding Excludes list:

    Laura abrió los ojos en la mañana y por una fracción de segundo no recordó dónde había pasado la noche. La claridad del sol le daba en la cara, se sentía ya un poco de calor y sudaba ligeramente. Y eso fue lo que le dio la pista. Ya. Había dormido en la casa de Daniel, su novio desde dos años atrás. En su propia casa y su propia cama, nunca habría despertado sudando. Pero él tenía la mala costumbre de apagar el aire acondicionado en la mitad de la noche, y las persianas quedaban abiertas. Su novio era un ángel. El sol daba precisamente en la cabecera y él jamás se daba cuenta. Así que ella tenía que levantarse a encender de nuevo el interruptor en la pared. ¿Cuándo pensaba cambiar a un aparato con control remoto? En unos cuantos años llegarían al cambio de siglo, con todas las comodidades que traería la nueva era. Daniel era un poco descuidado en eso, pensaba hablar muy seriamente con él. Cuando ya viviesen juntos, todo debía ajustarse a la mayor comodidad para ambos. Y no era que a él no le gustara la tecnología, ya que vivía de ello. Simplemente no se ocupaba de los detalles caseros hasta que Laura se lo sugería. Ese era su ángel y era un ñoño pero no le preocupaba que fuera muy ñoño.
    I've included there some words starting with á, é and ñ to test the code.

  2. #2
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,359
    Location
    What is that text supposed to look like when finished?
    Greg

    Visit my website: http://gregmaxey.com

  3. #3
    VBAX Newbie
    Joined
    Oct 2016
    Posts
    3
    Location
    When finished, the text will have highlighted all repeated words, heavily underlined all same words with a distance of 1 to 10 words (these parameters may change) and softly underlined more distanced words. Of course, another macro has to remove the highlights and underlines. This is only for revision purposes. It's easier than getting a "text statistics report".

  4. #4
    VBAX Newbie
    Joined
    Oct 2016
    Posts
    3
    Location
    Let me explain: the reason what I'm doing this is because I'm writing in spanish, and I've observed this is a frequent problem with writing. I have to check and recheck my text and sometimes is difficult to spot repeated words. Some repetitions are intentional, some are not. I've tried some software: Scrivener only reports text statistics but I have to close the report and manually search all the occurrences. MS-Word only counts words. ProWritingAid does a very good job but still the text has to be pasted in their box. Notepad++ has a nice feature: when you click any word, it highlights all the same words in the same text, but you still need to make the corrections in the original text, or losing all formatting. My macro helps me to work directly on the text in Word, like a combo of ProWritingAid and Notepad++.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •