View Full Version : [SOLVED:] Remove surplus spaces
I have a macro as per the attached .docm containing also the sample text.
It removes improper returns that have been left by OCRing documents. I am ok with the macro as is except the excess spaces. In the example it asks you to confirm the action on the 3 places highlighted. When complete you will notice that there are in some cases excess spaces between the two highlighted words. Is it possible to deal with these as part of the macro? I have made the macro to focus on just replacing the Chr(13), so that you don't get format bleed that would normally happen on a search and replace.
Public Sub FixOCR_Returns_Click()
Dim oRng       As Range
Set oRng = ActiveDocument.Content
Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
    .Text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
            "]{1,3})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
    .MatchWildcards = True
    .Font.Superscript = False
    Do While .Execute
        oRng.Select
        'Call SelectionScrollIntoMiddleOfView
        Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)
            Case vbYes:
                oRng.MoveEndUntil Chr(13), wdBackward
                oRng.MoveStartWhile Chr(13)
                oRng.MoveStartUntil Chr(13)
                oRng.Select
                oRng.Text = Replace(oRng.Text, Chr(13), " ")
                oRng.Collapse wdCollapseEnd
            Case vbNo:
                oRng.Collapse wdCollapseEnd
            Case Else
                Selection.Collapse wdCollapseStart
                Exit Sub
        End Select
    Loop
    MsgBox ("No more matches found")
End With
lbl_Exit:
Exit Sub
End Sub
gmaxey
12-11-2020, 07:11 AM
Public Sub FixOCR_Returns_Click()
Dim oRng       As Range
  Set oRng = Selection.Range
  Selection.Find.ClearFormatting
  With oRng.Find
    .Text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
            "]{1,3})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
    .MatchWildcards = True
    .Font.Superscript = False
    Do While .Execute
      oRng.Select
      'Call SelectionScrollIntoMiddleOfView
      Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)
        Case vbYes:
          oRng.Text = Replace(oRng.Text, Chr(13), "")
          Do While InStr(oRng.Text, "  ") > 0
            oRng.Text = Replace(oRng.Text, "  ", " ")
          Loop
          oRng.Collapse wdCollapseEnd
        Case vbNo:
          oRng.Collapse wdCollapseEnd
        Case Else
          Selection.Collapse wdCollapseStart
          Exit Sub
      End Select
    Loop
    MsgBox ("No more matches found")
  End With
lbl_Exit:
  Exit Sub
End Sub
Thanks for that but notice on the second and third highlighted word after the macro is run.The bold has bled through to the next word. This what I was trying to avoid with what I had done.
gmaxey
12-11-2020, 08:42 AM
Seems like a shell game with ranges.  This works with your example, but I had to change your search string a little 
Public Sub FixOCR_Returns_Click()
Dim oRng       As Range
  Set oRng = Selection.Range
  Selection.Find.ClearFormatting
  With oRng.Find
    .Text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
            "]{1})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
    .MatchWildcards = True
    .Font.Superscript = False
    Do While .Execute
      oRng.Select
      'Call SelectionScrollIntoMiddleOfView
      Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)
        Case vbYes:
          Do While oRng.Characters.First = " " And oRng.Characters.First.Previous = " "
            oRng.MoveStart wdCharacter, -1
          Loop
          Do Until oRng.Characters.First = " " Or oRng.Characters.First = Chr(13)
            oRng.MoveStart wdCharacter, 1
          Loop
          Do Until oRng.Characters.Last = " " Or oRng.Characters.Last = Chr(13)
            oRng.MoveEnd wdCharacter, -1
          Loop
          oRng.Select
          oRng.Text = " "
          
          oRng.Collapse wdCollapseEnd
        Case vbNo:
          oRng.Collapse wdCollapseEnd
        Case Else
          Selection.Collapse wdCollapseStart
          Exit Sub
      End Select
    Loop
    MsgBox ("No more matches found")
  End With
lbl_Exit:
  Exit Sub
End Sub
Thanks, that has solved that issue, no bleed or excess spaces. That will be a better result.
I just encountered some really bad ocr that has multiple spaces at the end of a line and at the start of the next. This is just a tweak of my first macro but it no longer needs the (1,3) in the search.
It appears to soak up the spaces ok no matter how many there are before or after.  ^13 Also does not appear to bleed formatting.
Probably don't need the green lines....No it is better to leave those in.
Private Sub CommandButton64TEST_Click()Dim oRng       As Range
Set oRng = ActiveDocument.Content
If CheckBox16WholeDocumentSearch.Value = True Then
    selection.EndKey Unit:=wdStory, Extend:=wdExtend
Else
    Set oRng = selection.Range
End If
selection.Find.ClearFormatting
With oRng.Find
    .text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
            "]{3,})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
    .MatchWildcards = True
    '.Font.Bold = False
    .Font.Superscript = False
    Do While .Execute
        oRng.Select
        Call SelectionScrollIntoMiddleOfView
        Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)
            Case vbYes:
                oRng.MoveEndUntil Chr(13), wdBackward
                oRng.MoveStartWhile Chr(13)
                oRng.MoveStartUntil Chr(13)
                oRng.Select
                oRng.text = Replace(oRng.text, Chr(13), " ")
                selection.Expand
                selection.MoveStartUntil " "
                selection.Range.Delete
                selection.Range.InsertAfter " "
                oRng.Collapse wdCollapseEnd
            Case vbNo:
                oRng.Collapse wdCollapseEnd
            Case Else
                selection.Collapse wdCollapseStart
                Exit Sub
        End Select
    Loop
    MsgBox ("No more matches found")
End With
lbl_Exit:
Exit Sub
End Sub
macropod
12-12-2020, 04:01 AM
Other than perhaps the interactivity, I'm not sure you're achieving anything more than:
Sub Demo()Application.ScreenUpdating = False
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Forward = True
    .Format = False
    .Wrap = wdFindContinue
    .MatchWildcards = True
    .Replacement.Text = " "
    .Text = "[ ]{1,}^13[ ]{1,}"
    .Execute Replace:=wdReplaceAll
    .Text = "^13[ ]{1,}"
    .Execute Replace:=wdReplaceAll
    .Text = "[ ]{1,}^13"
    .Execute Replace:=wdReplaceAll
  End With
End With
Application.ScreenUpdating = True
End Sub
The interactivity is key to this particular macro, along with searching for valid cases to consider.
I need to think about this some more...
I found it was picking up legitimate things more that is should, so changing
    .text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
            "]{1,3})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
To
    .text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
            "]{3,})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
Is an improvement.
gmaxey
12-12-2020, 10:00 AM
I don't really know exactly what you are trying to do, but it seems the issue is a paragraph mark either preceded by or followed by one or more spaces.  Perhaps:
Private Sub CommandButton64TEST_Click()Dim oRng As Range
Dim oRngShow As Range
Dim bWholeDoc As Boolean
bWholeDoc = True
Set oRng = ActiveDocument.Content
If Not bWholeDoc Then Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
  .Text = Chr(13)
  .Font.Superscript = False
  Do While .Execute
    On Error GoTo Err_GetOut
    If oRng.Characters.First.Previous = " " Or oRng.Characters.Last.Next = " " Then
      Set oRngShow = oRng.Duplicate
      oRngShow.MoveStart wdWord, -3
      oRngShow.MoveEnd wdWord, 3
      oRngShow.Select
      ActiveWindow.ScrollIntoView Selection.Range
      Select Case MsgBox("Do you want to close up this break?", vbYesNoCancel)
        Case vbYes:
          Do Until oRng.Characters.First.Previous <> " "
            oRng.MoveStart wdCharacter, -1
          Loop
          Do Until oRng.Characters.Last.Next <> " "
            oRng.MoveEnd wdCharacter, 1
          Loop
          oRng.Text = " "
          oRng.Collapse wdCollapseEnd
        Case vbNo:
          oRng.Collapse wdCollapseEnd
        Case Else
          Exit Do
      End Select
    End If
  Loop
End With
lbl_Exit:
Selection.Collapse wdCollapseStart
oRng.Collapse wdCollapseEnd
Exit Sub
Err_GetOut:
MsgBox ("No more matches found")
Resume lbl_Exit
End Sub
macropod
12-12-2020, 12:22 PM
The interactivity is key to this particular macro, along with searching for valid cases to consider.
That's all very well (and was obviously understood), but it fails to address the issue I raised. Try:
Sub Demo()
Dim Rslt
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "[^13]{1,}"
    .Replacement.Text = ""
    .Forward = True
    .Format = False
    .Wrap = wdFindStop
    .MatchWildcards = True
  End With
  Do While .Find.Execute
    If .End < ActiveDocument.Range.End Then
      Do While .Characters.Last.Next Like "[ " & vbCr & "]"
        If .End = ActiveDocument.Range.End Then Exit Do
        .End = .End + 1
      Loop
    End If
    If .Start > ActiveDocument.Range.Start Then
      Do While .Characters.First.Previous = " "
        If .Start = ActiveDocument.Range.Start Then Exit Do
        .Start = .Start - 1
      Loop
    End If
    If .End = ActiveDocument.Range.End Then
      If Len(.Text) = 1 Then Exit Sub
    End If
    .Select
    Rslt = MsgBox("Replace this instance?", vbYesNoCancel)
    Select Case Rslt
      Case Is = vbYes
        If .End = ActiveDocument.Range.End Then
          .Text = ""
        ElseIf .Start = ActiveDocument.Range.Start Then
          .Text = ""
        Else
          .Text = " "
        End If
      Case Is = vbCancel: Exit Sub
    End Select
    If .End = ActiveDocument.Range.End Then Exit Sub
    .Collapse wdCollapseEnd
  Loop
End With
End Sub
In addition to dealing with simple excess spaces, the above code also deals with excess spaces interspersed with paragraph breaks.
gmaxey
12-12-2020, 02:05 PM
Paul,
  I'm still not sure of the exact objective but I think the OP is trying to show some sort of span left and right of the offending spaces.  I've tried to pull your code and mine together as one.  ???
Sub Demo()
Dim oRng As Range
Dim oRngShow As Range
Dim bWholeDoc As Boolean
Dim lngIndex As Long, lngSpanLR As Long
  lngSpanLR = 3
  bWholeDoc = True
  Set oRng = ActiveDocument.Content
  If Not bWholeDoc Then Set oRng = Selection.Range
  Selection.Find.ClearFormatting
  With oRng.Find
    .Text = Chr(13) & "{1,}"
    .MatchWildcards = True
    Do While .Execute
      With oRng
        If .End < ActiveDocument.Range.End Then
          Do While oRng.Characters.Last.Next Like "[ " & vbCr & "]" And .End <> ActiveDocument.Range.End
            .End = .End + 1
          Loop
        End If
        If .Start > ActiveDocument.Range.Start Then
          Do While .Characters.First.Previous = " " And .Start <> ActiveDocument.Range.Start
            .Start = .Start - 1
          Loop
        End If
        Set oRngShow = .Duplicate
        On Error Resume Next
        For lngIndex = 1 To lngSpanLR
          oRngShow.MoveStart wdWord, -1
          oRngShow.MoveEnd wdWord, 1
        Next lngIndex
        On Error GoTo 0
        Select Case True
          Case .End = ActiveDocument.Range.End And Len(.Text) = 1
          Case Else
            oRngShow.Select
            Select Case MsgBox("Remove break at this location?", vbYesNoCancel)
              Case Is = vbYes
                If .End = ActiveDocument.Range.End Then
                  .Text = ""
                ElseIf .Start = ActiveDocument.Range.Start Then
                  .Text = ""
                Else
                  .Text = " "
                End If
              Case Is = vbCancel: Exit Sub
            End Select
        End Select
        Selection.Collapse wdCollapseEnd
        .Collapse wdCollapseEnd
        If .End + 1 = ActiveDocument.Range.End Then Exit Do
      End With
    Loop
  End With
  MsgBox "There are no more surplus space/breaks found.", vbOKOnly, "COMPLETE"
End Sub
macropod
12-12-2020, 09:32 PM
Hi Greg,
My approach to that would be:
Sub Demo()
Dim MsgRng As Range, Rslt As Variant
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "[^13]{1,}"
    .Replacement.Text = ""
    .Forward = True
    .Format = False
    .Wrap = wdFindStop
    .MatchWildcards = True
  End With
  Do While .Find.Execute
    If .End < ActiveDocument.Range.End Then
      Do While .Characters.Last.Next Like "[ " & vbCr & "]"
        If .End = ActiveDocument.Range.End Then Exit Do
        .End = .End + 1
      Loop
    End If
    If .Start > ActiveDocument.Range.Start Then
      Do While .Characters.First.Previous = " "
        If .Start = ActiveDocument.Range.Start Then Exit Do
        .Start = .Start - 1
      Loop
    End If
    If .End = ActiveDocument.Range.End Then
      If Len(.Text) = 1 Then Exit Sub
    End If
    Set MsgRng = .Duplicate
    With MsgRng
      .MoveStart wdWord, -1
      .MoveEnd wdWord, 1
      .Select
    End With
    Rslt = MsgBox("Replace this instance?", vbYesNoCancel)
    Select Case Rslt
      Case Is = vbYes
        If .End = ActiveDocument.Range.End Then
          .Text = ""
        ElseIf .Start = ActiveDocument.Range.Start Then
          .Text = ""
        Else
          .Text = " "
        End If
      Case Is = vbCancel: Exit Sub
    End Select
    If .End = ActiveDocument.Range.End Then Exit Sub
    .Collapse wdCollapseEnd
  Loop
End With
End Sub
gmaxey
12-13-2020, 11:03 AM
Paul,
  I looks like our methods are similar.  I had just assumed that trying to move three words to the left or right would have err'd if they didn't exists thus the error handler.  Also addressed the possible issue of non-breaking Chr(160) spaces:
Sub Demo()
Dim oRng As Range
Dim oRngShow As Range
Dim bWholeDoc As Boolean
  bWholeDoc = True
  Set oRng = ActiveDocument.Content
  If Not bWholeDoc Then Set oRng = Selection.Range
  Selection.Find.ClearFormatting
  With oRng.Find
    .Text = Chr(13) & "{1,}"
    .MatchWildcards = True
    Do While .Execute
      With oRng
        If .End < ActiveDocument.Range.End Then
          Do While oRng.Characters.Last.Next Like "[" & Chr(32) + Chr(160) + vbCr & "]" And .End <> ActiveDocument.Range.End
            .End = .End + 1
            If .End = ActiveDocument.Range.End Then Exit Do
          Loop
        End If
        If .Start > ActiveDocument.Range.Start Then
          Do While .Characters.First.Previous Like "[" & Chr(32) + Chr(160) & "]"
            .Start = .Start - 1
            If .Start = ActiveDocument.Range.Start Then Exit Do
          Loop
        End If
        Set oRngShow = .Duplicate
        oRngShow.MoveStart wdWord, -3
        oRngShow.MoveEnd wdWord, 3
        oRngShow.Select
        Select Case MsgBox("Remove break at this location?", vbYesNoCancel)
          Case Is = vbYes
            If .End = ActiveDocument.Range.End Or .Start = ActiveDocument.Range.Start Then
              .Text = ""
            Else
              .Text = " "
            End If
          Case Is = vbCancel: Exit Sub
        End Select
        Selection.Collapse wdCollapseEnd
        .Collapse wdCollapseEnd
        If .End + 1 = ActiveDocument.Range.End Then Exit Do
      End With
    Loop
  End With
  MsgBox "There are no more surplus space/breaks found.", vbOKOnly, "COMPLETE"
lbl_Exit:
  Exit Sub
End Sub
macropod
12-13-2020, 02:07 PM
I looks like our methods are similar.
Indeed they are.
the possible issue of non-breaking Chr(160) spaces
I'm not aware of any OCR output (which is what we're dealing with here) that introduces non-breaking spaces into a document. If that is a real possibility, the Chr(160) spaces aren't the only ones that might need consideration.
Thank you Greg and Paul.
My original macro was made with the intent of trying to skip over the cases that normally would not be an issue, like where a paragraph ends with a full stop, or if there was bold text, due to a heading etc. The issue that I encountered was the replacement leaving double spaces and the format bleed. Both these things you have fixed and so as it is it works just fine for current task.
However the latter macro would be one I would use on a different job, that perhaps did not include footnote and page numbers that are left in the ocr text that have to be dealt with as I went along....
I have another version where I can choose to add punctuation to include at runtime taken from a textbox on my form..
So for me it is using the macro for the job, and I find each has their use and flexibility.
It is also beneficial to set bWholeDoc = False so that it will start at cursor position.
I am sure these macros will be helpful to others.
Dim bWholeDoc As Boolean  bWholeDoc = False
  Set oRng = ActiveDocument.Content
  If Not bWholeDoc Then Set oRng = selection.Range
Also I found it beneficial to add a screen scroll on find to the macro
With oRng.Find    .text = Chr(13) & "{1,}"
    .MatchWildcards = True
    Do While .Execute
      With oRng
      Call SelectionScrollIntoMiddleOfView
        If .End < ActiveDocument.Range.End Then
          Do While oRng.Characters.Last.Next Like "[" & Chr(32) + Chr(160) + vbCr & "]" And .End <> ActiveDocument.Range.End
            .End = .End + 1
            If .End = ActiveDocument.Range.End Then Exit Do
          Loop
        End If
Sub SelectionScrollIntoMiddleOfView()'https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-msoffice_custom-mso_2010/centre-selected-text-in-the-middle-of-the-screen/84ab25fe-9570-4b55-91bd-4b11a04bb99b?auth=1
  Dim pLeft As Long
  Dim pTop As Long, lTop As Long, wTop As Long
  Dim pWidth As Long
  Dim pHeight As Long, wHeight As Long
  Dim Direction As Integer
  wHeight = PixelsToPoints(ActiveWindow.Height, True)
  ActiveWindow.GetPoint pLeft, wTop, pWidth, pHeight, ActiveWindow
  ActiveWindow.GetPoint pLeft, pTop, pWidth, pHeight, selection.Range
  Direction = Sgn((pTop + pHeight / 2) - (wTop + wHeight / 2))
  Do While Sgn((pTop + pHeight / 2) - (wTop + wHeight / 2)) = Direction And (lTop <> pTop)
    ActiveWindow.SmallScroll Direction, down
    On Error Resume Next
    lTop = pTop
    ActiveWindow.GetPoint pLeft, pTop, pWidth, pHeight, selection.Range
  Loop
End Sub
gmaxey
12-15-2020, 04:00 PM
Paul, if you are still following this thread, have you looked at the OP procedure to scroll the selection to the middle of the screen?  Here is falls flat on its face and doesn't work at all. I've tinkered a bit with it but despite coming close, I can't get the selection dead center (plus or minus a line) of the active window.  I've tested with single line of text selected and multiple lines.  Have you got a better idea?
Sub SelectionScrollIntoMiddleOfView()
Dim lngLeft As Long, lngTop As Long, lngWidth As Long, lngHeight As Long
Dim lngWinTop As Long, lngWinHgt As Long
Dim lngScroll As Integer
Dim lngCenter As Long, lngWinCenter As Long
Dim lngCounter As Long
  lngWinHgt = PixelsToPoints(ActiveWindow.Height, True)
  lngWinTop = PixelsToPoints(ActiveWindow.Top, True)
  lngWinCenter = (lngWinTop + lngWinHgt) / 2
  ActiveWindow.ScrollIntoView Selection.Range
  ActiveWindow.GetPoint lngLeft, lngTop, lngWidth, lngHeight, Selection.Range
  lngCenter = (lngTop + lngHeight) / 2
  Do
    lngScroll = Sgn(lngCenter - lngWinCenter)
    ActiveWindow.SmallScroll lngScroll
    lngCounter = lngCounter + 1
    ActiveWindow.GetPoint lngLeft, lngTop, lngWidth, lngHeight, Selection.Range
    lngCenter = (lngTop + lngHeight) / 2
    If lngScroll > 0 Then
      If lngCenter > lngWinCenter Then
        If lngCenter - lngWinCenter < 10 Then Exit Do
      Else
        If lngWinCenter - lngCenter < 10 Then Exit Do
      End If
    Else
      If lngCenter > lngWinCenter Then
        If lngWinCenter - lngCenter < 10 Then Exit Do
      Else
        If lngWinCenter - lngCenter < 10 Then Exit Do
      End If
    End If
    If lngCounter > 20 Then Exit Do
  Loop
lbl_Exit:
  Exit Sub
End Sub
macropod
12-15-2020, 04:39 PM
Hi Greg,
I seriously doubt vertically centering the selection on the screen is advisable; that risks placing it immediately behind the message box. To my mind, adding the following two lines after '.Select' in my code is sufficient:
    ActiveWindow.LargeScroll Down:=1
    ActiveWindow.ScrollIntoView Selection.Range
That in invariably places the selected range just above the message box, which seems pretty close to optimal to me.
Hi Greg,
I seriously doubt vertically centering the selection on the screen is advisable; that risks placing it immediately behind the message box. To my mind, adding the following two lines after '.Select' in my code is sufficient:
    ActiveWindow.LargeScroll Down:=1
    ActiveWindow.ScrollIntoView Selection.Range
That in invariably places the selected range just above the message box, which seems pretty close to optimal to me.
Good point Paul, your solution is 'close to optimal'.
The other method does have some more control over where the text will scroll to as you will have discerned in the code. As it is I have Word on one side of the screen and so this is not an issue. If I had it filling the whole screen I would change the code to have it display more at the top, so in just the two places I have put 12 instead of 2. Another tip is to double click the gap between pages as it allows the pages to flow better
Direction = Sgn((pTop + pHeight / 2) - (wTop + wHeight / 12))  Do While Sgn((pTop + pHeight / 2) - (wTop + wHeight / 12)) = Direction And (lTop <> pTop)
Jon
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.