PDA

View Full Version : [SOLVED:] Remove surplus spaces



JPG
12-11-2020, 02:45 AM
I have a macro as per the attached .docm containing also the sample text.
It removes improper returns that have been left by OCRing documents. I am ok with the macro as is except the excess spaces. In the example it asks you to confirm the action on the 3 places highlighted. When complete you will notice that there are in some cases excess spaces between the two highlighted words. Is it possible to deal with these as part of the macro? I have made the macro to focus on just replacing the Chr(13), so that you don't get format bleed that would normally happen on a search and replace.


Public Sub FixOCR_Returns_Click()
Dim oRng As Range
Set oRng = ActiveDocument.Content
Set oRng = Selection.Range


Selection.Find.ClearFormatting
With oRng.Find
.Text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
"]{1,3})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"


.MatchWildcards = True
.Font.Superscript = False


Do While .Execute
oRng.Select
'Call SelectionScrollIntoMiddleOfView
Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)


Case vbYes:
oRng.MoveEndUntil Chr(13), wdBackward
oRng.MoveStartWhile Chr(13)
oRng.MoveStartUntil Chr(13)
oRng.Select
oRng.Text = Replace(oRng.Text, Chr(13), " ")


oRng.Collapse wdCollapseEnd
Case vbNo:
oRng.Collapse wdCollapseEnd
Case Else
Selection.Collapse wdCollapseStart
Exit Sub
End Select


Loop
MsgBox ("No more matches found")
End With
lbl_Exit:
Exit Sub


End Sub

gmaxey
12-11-2020, 07:11 AM
Public Sub FixOCR_Returns_Click()
Dim oRng As Range
Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
.Text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
"]{1,3})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"

.MatchWildcards = True
.Font.Superscript = False
Do While .Execute
oRng.Select
'Call SelectionScrollIntoMiddleOfView
Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)
Case vbYes:
oRng.Text = Replace(oRng.Text, Chr(13), "")
Do While InStr(oRng.Text, " ") > 0
oRng.Text = Replace(oRng.Text, " ", " ")
Loop
oRng.Collapse wdCollapseEnd
Case vbNo:
oRng.Collapse wdCollapseEnd
Case Else
Selection.Collapse wdCollapseStart
Exit Sub
End Select
Loop
MsgBox ("No more matches found")
End With
lbl_Exit:
Exit Sub
End Sub

JPG
12-11-2020, 07:53 AM
Thanks for that but notice on the second and third highlighted word after the macro is run.The bold has bled through to the next word. This what I was trying to avoid with what I had done.

gmaxey
12-11-2020, 08:42 AM
Seems like a shell game with ranges. This works with your example, but I had to change your search string a little

Public Sub FixOCR_Returns_Click()
Dim oRng As Range
Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
.Text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
"]{1})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"

.MatchWildcards = True
.Font.Superscript = False
Do While .Execute
oRng.Select
'Call SelectionScrollIntoMiddleOfView
Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)
Case vbYes:
Do While oRng.Characters.First = " " And oRng.Characters.First.Previous = " "
oRng.MoveStart wdCharacter, -1
Loop
Do Until oRng.Characters.First = " " Or oRng.Characters.First = Chr(13)
oRng.MoveStart wdCharacter, 1
Loop
Do Until oRng.Characters.Last = " " Or oRng.Characters.Last = Chr(13)
oRng.MoveEnd wdCharacter, -1
Loop
oRng.Select
oRng.Text = " "

oRng.Collapse wdCollapseEnd
Case vbNo:
oRng.Collapse wdCollapseEnd
Case Else
Selection.Collapse wdCollapseStart
Exit Sub
End Select
Loop
MsgBox ("No more matches found")
End With
lbl_Exit:
Exit Sub
End Sub

JPG
12-11-2020, 09:05 AM
Thanks, that has solved that issue, no bleed or excess spaces. That will be a better result.

JPG
12-12-2020, 01:56 AM
I just encountered some really bad ocr that has multiple spaces at the end of a line and at the start of the next. This is just a tweak of my first macro but it no longer needs the (1,3) in the search.
It appears to soak up the spaces ok no matter how many there are before or after. ^13 Also does not appear to bleed formatting.
Probably don't need the green lines....No it is better to leave those in.

Private Sub CommandButton64TEST_Click()Dim oRng As Range
Set oRng = ActiveDocument.Content
If CheckBox16WholeDocumentSearch.Value = True Then
selection.EndKey Unit:=wdStory, Extend:=wdExtend
Else
Set oRng = selection.Range
End If


selection.Find.ClearFormatting
With oRng.Find


.text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
"]{3,})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"


.MatchWildcards = True
'.Font.Bold = False
.Font.Superscript = False


Do While .Execute
oRng.Select
Call SelectionScrollIntoMiddleOfView
Select Case MsgBox("Replace " & oRng & "?", vbYesNoCancel)


Case vbYes:
oRng.MoveEndUntil Chr(13), wdBackward
oRng.MoveStartWhile Chr(13)
oRng.MoveStartUntil Chr(13)
oRng.Select
oRng.text = Replace(oRng.text, Chr(13), " ")


selection.Expand
selection.MoveStartUntil " "
selection.Range.Delete
selection.Range.InsertAfter " "


oRng.Collapse wdCollapseEnd


Case vbNo:
oRng.Collapse wdCollapseEnd
Case Else
selection.Collapse wdCollapseStart
Exit Sub
End Select


Loop
MsgBox ("No more matches found")
End With
lbl_Exit:
Exit Sub


End Sub

macropod
12-12-2020, 04:01 AM
Other than perhaps the interactivity, I'm not sure you're achieving anything more than:

Sub Demo()Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Forward = True
.Format = False
.Wrap = wdFindContinue
.MatchWildcards = True
.Replacement.Text = " "
.Text = "[ ]{1,}^13[ ]{1,}"
.Execute Replace:=wdReplaceAll
.Text = "^13[ ]{1,}"
.Execute Replace:=wdReplaceAll
.Text = "[ ]{1,}^13"
.Execute Replace:=wdReplaceAll
End With
End With
Application.ScreenUpdating = True
End Sub

JPG
12-12-2020, 04:16 AM
The interactivity is key to this particular macro, along with searching for valid cases to consider.
I need to think about this some more...
I found it was picking up legitimate things more that is should, so changing


.text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
"]{1,3})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"
To


.text = "([A-Z0-9a-z\-,’;:'\) " & ChrW(880) & "-" & ChrW(8190) & _
"]{3,})^13([ A-Za-z'""""0-9‘(" & ChrW(880) & "-" & ChrW(8190) & "]{1,2})"

Is an improvement.

gmaxey
12-12-2020, 10:00 AM
I don't really know exactly what you are trying to do, but it seems the issue is a paragraph mark either preceded by or followed by one or more spaces. Perhaps:


Private Sub CommandButton64TEST_Click()Dim oRng As Range
Dim oRngShow As Range
Dim bWholeDoc As Boolean


bWholeDoc = True
Set oRng = ActiveDocument.Content
If Not bWholeDoc Then Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
.Text = Chr(13)
.Font.Superscript = False
Do While .Execute
On Error GoTo Err_GetOut
If oRng.Characters.First.Previous = " " Or oRng.Characters.Last.Next = " " Then
Set oRngShow = oRng.Duplicate
oRngShow.MoveStart wdWord, -3
oRngShow.MoveEnd wdWord, 3
oRngShow.Select
ActiveWindow.ScrollIntoView Selection.Range
Select Case MsgBox("Do you want to close up this break?", vbYesNoCancel)
Case vbYes:
Do Until oRng.Characters.First.Previous <> " "
oRng.MoveStart wdCharacter, -1
Loop
Do Until oRng.Characters.Last.Next <> " "
oRng.MoveEnd wdCharacter, 1
Loop
oRng.Text = " "
oRng.Collapse wdCollapseEnd
Case vbNo:
oRng.Collapse wdCollapseEnd
Case Else
Exit Do
End Select
End If
Loop
End With
lbl_Exit:
Selection.Collapse wdCollapseStart
oRng.Collapse wdCollapseEnd
Exit Sub
Err_GetOut:
MsgBox ("No more matches found")
Resume lbl_Exit
End Sub

macropod
12-12-2020, 12:22 PM
The interactivity is key to this particular macro, along with searching for valid cases to consider.
That's all very well (and was obviously understood), but it fails to address the issue I raised. Try:

Sub Demo()
Dim Rslt
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "[^13]{1,}"
.Replacement.Text = ""
.Forward = True
.Format = False
.Wrap = wdFindStop
.MatchWildcards = True
End With
Do While .Find.Execute
If .End < ActiveDocument.Range.End Then
Do While .Characters.Last.Next Like "[ " & vbCr & "]"
If .End = ActiveDocument.Range.End Then Exit Do
.End = .End + 1
Loop
End If
If .Start > ActiveDocument.Range.Start Then
Do While .Characters.First.Previous = " "
If .Start = ActiveDocument.Range.Start Then Exit Do
.Start = .Start - 1
Loop
End If
If .End = ActiveDocument.Range.End Then
If Len(.Text) = 1 Then Exit Sub
End If
.Select
Rslt = MsgBox("Replace this instance?", vbYesNoCancel)
Select Case Rslt
Case Is = vbYes
If .End = ActiveDocument.Range.End Then
.Text = ""
ElseIf .Start = ActiveDocument.Range.Start Then
.Text = ""
Else
.Text = " "
End If
Case Is = vbCancel: Exit Sub
End Select
If .End = ActiveDocument.Range.End Then Exit Sub
.Collapse wdCollapseEnd
Loop
End With
End Sub
In addition to dealing with simple excess spaces, the above code also deals with excess spaces interspersed with paragraph breaks.

gmaxey
12-12-2020, 02:05 PM
Paul,

I'm still not sure of the exact objective but I think the OP is trying to show some sort of span left and right of the offending spaces. I've tried to pull your code and mine together as one. ???


Sub Demo()
Dim oRng As Range
Dim oRngShow As Range
Dim bWholeDoc As Boolean
Dim lngIndex As Long, lngSpanLR As Long
lngSpanLR = 3
bWholeDoc = True
Set oRng = ActiveDocument.Content
If Not bWholeDoc Then Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
.Text = Chr(13) & "{1,}"
.MatchWildcards = True
Do While .Execute
With oRng
If .End < ActiveDocument.Range.End Then
Do While oRng.Characters.Last.Next Like "[ " & vbCr & "]" And .End <> ActiveDocument.Range.End
.End = .End + 1
Loop
End If
If .Start > ActiveDocument.Range.Start Then
Do While .Characters.First.Previous = " " And .Start <> ActiveDocument.Range.Start
.Start = .Start - 1
Loop
End If
Set oRngShow = .Duplicate
On Error Resume Next
For lngIndex = 1 To lngSpanLR
oRngShow.MoveStart wdWord, -1
oRngShow.MoveEnd wdWord, 1
Next lngIndex
On Error GoTo 0
Select Case True
Case .End = ActiveDocument.Range.End And Len(.Text) = 1
Case Else
oRngShow.Select
Select Case MsgBox("Remove break at this location?", vbYesNoCancel)
Case Is = vbYes
If .End = ActiveDocument.Range.End Then
.Text = ""
ElseIf .Start = ActiveDocument.Range.Start Then
.Text = ""
Else
.Text = " "
End If
Case Is = vbCancel: Exit Sub
End Select
End Select
Selection.Collapse wdCollapseEnd
.Collapse wdCollapseEnd
If .End + 1 = ActiveDocument.Range.End Then Exit Do
End With
Loop
End With
MsgBox "There are no more surplus space/breaks found.", vbOKOnly, "COMPLETE"
End Sub

macropod
12-12-2020, 09:32 PM
Hi Greg,

My approach to that would be:

Sub Demo()
Dim MsgRng As Range, Rslt As Variant
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "[^13]{1,}"
.Replacement.Text = ""
.Forward = True
.Format = False
.Wrap = wdFindStop
.MatchWildcards = True
End With
Do While .Find.Execute
If .End < ActiveDocument.Range.End Then
Do While .Characters.Last.Next Like "[ " & vbCr & "]"
If .End = ActiveDocument.Range.End Then Exit Do
.End = .End + 1
Loop
End If
If .Start > ActiveDocument.Range.Start Then
Do While .Characters.First.Previous = " "
If .Start = ActiveDocument.Range.Start Then Exit Do
.Start = .Start - 1
Loop
End If
If .End = ActiveDocument.Range.End Then
If Len(.Text) = 1 Then Exit Sub
End If
Set MsgRng = .Duplicate
With MsgRng
.MoveStart wdWord, -1
.MoveEnd wdWord, 1
.Select
End With
Rslt = MsgBox("Replace this instance?", vbYesNoCancel)
Select Case Rslt
Case Is = vbYes
If .End = ActiveDocument.Range.End Then
.Text = ""
ElseIf .Start = ActiveDocument.Range.Start Then
.Text = ""
Else
.Text = " "
End If
Case Is = vbCancel: Exit Sub
End Select
If .End = ActiveDocument.Range.End Then Exit Sub
.Collapse wdCollapseEnd
Loop
End With
End Sub

gmaxey
12-13-2020, 11:03 AM
Paul,

I looks like our methods are similar. I had just assumed that trying to move three words to the left or right would have err'd if they didn't exists thus the error handler. Also addressed the possible issue of non-breaking Chr(160) spaces:


Sub Demo()
Dim oRng As Range
Dim oRngShow As Range
Dim bWholeDoc As Boolean
bWholeDoc = True
Set oRng = ActiveDocument.Content
If Not bWholeDoc Then Set oRng = Selection.Range
Selection.Find.ClearFormatting
With oRng.Find
.Text = Chr(13) & "{1,}"
.MatchWildcards = True
Do While .Execute
With oRng
If .End < ActiveDocument.Range.End Then
Do While oRng.Characters.Last.Next Like "[" & Chr(32) + Chr(160) + vbCr & "]" And .End <> ActiveDocument.Range.End
.End = .End + 1
If .End = ActiveDocument.Range.End Then Exit Do
Loop
End If
If .Start > ActiveDocument.Range.Start Then
Do While .Characters.First.Previous Like "[" & Chr(32) + Chr(160) & "]"
.Start = .Start - 1
If .Start = ActiveDocument.Range.Start Then Exit Do
Loop
End If
Set oRngShow = .Duplicate
oRngShow.MoveStart wdWord, -3
oRngShow.MoveEnd wdWord, 3
oRngShow.Select
Select Case MsgBox("Remove break at this location?", vbYesNoCancel)
Case Is = vbYes
If .End = ActiveDocument.Range.End Or .Start = ActiveDocument.Range.Start Then
.Text = ""
Else
.Text = " "
End If
Case Is = vbCancel: Exit Sub
End Select
Selection.Collapse wdCollapseEnd
.Collapse wdCollapseEnd
If .End + 1 = ActiveDocument.Range.End Then Exit Do
End With
Loop
End With
MsgBox "There are no more surplus space/breaks found.", vbOKOnly, "COMPLETE"
lbl_Exit:
Exit Sub
End Sub

macropod
12-13-2020, 02:07 PM
I looks like our methods are similar.
Indeed they are.

the possible issue of non-breaking Chr(160) spaces
I'm not aware of any OCR output (which is what we're dealing with here) that introduces non-breaking spaces into a document. If that is a real possibility, the Chr(160) spaces aren't the only ones that might need consideration.

JPG
12-15-2020, 05:26 AM
Thank you Greg and Paul.
My original macro was made with the intent of trying to skip over the cases that normally would not be an issue, like where a paragraph ends with a full stop, or if there was bold text, due to a heading etc. The issue that I encountered was the replacement leaving double spaces and the format bleed. Both these things you have fixed and so as it is it works just fine for current task.

However the latter macro would be one I would use on a different job, that perhaps did not include footnote and page numbers that are left in the ocr text that have to be dealt with as I went along....
I have another version where I can choose to add punctuation to include at runtime taken from a textbox on my form..
So for me it is using the macro for the job, and I find each has their use and flexibility.
It is also beneficial to set bWholeDoc = False so that it will start at cursor position.

I am sure these macros will be helpful to others.


Dim bWholeDoc As Boolean bWholeDoc = False
Set oRng = ActiveDocument.Content
If Not bWholeDoc Then Set oRng = selection.Range


Also I found it beneficial to add a screen scroll on find to the macro


With oRng.Find .text = Chr(13) & "{1,}"
.MatchWildcards = True
Do While .Execute
With oRng
Call SelectionScrollIntoMiddleOfView
If .End < ActiveDocument.Range.End Then
Do While oRng.Characters.Last.Next Like "[" & Chr(32) + Chr(160) + vbCr & "]" And .End <> ActiveDocument.Range.End
.End = .End + 1
If .End = ActiveDocument.Range.End Then Exit Do
Loop
End If



Sub SelectionScrollIntoMiddleOfView()'https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-msoffice_custom-mso_2010/centre-selected-text-in-the-middle-of-the-screen/84ab25fe-9570-4b55-91bd-4b11a04bb99b?auth=1
Dim pLeft As Long
Dim pTop As Long, lTop As Long, wTop As Long
Dim pWidth As Long
Dim pHeight As Long, wHeight As Long
Dim Direction As Integer


wHeight = PixelsToPoints(ActiveWindow.Height, True)
ActiveWindow.GetPoint pLeft, wTop, pWidth, pHeight, ActiveWindow
ActiveWindow.GetPoint pLeft, pTop, pWidth, pHeight, selection.Range


Direction = Sgn((pTop + pHeight / 2) - (wTop + wHeight / 2))
Do While Sgn((pTop + pHeight / 2) - (wTop + wHeight / 2)) = Direction And (lTop <> pTop)
ActiveWindow.SmallScroll Direction, down
On Error Resume Next
lTop = pTop
ActiveWindow.GetPoint pLeft, pTop, pWidth, pHeight, selection.Range
Loop
End Sub

gmaxey
12-15-2020, 04:00 PM
Paul, if you are still following this thread, have you looked at the OP procedure to scroll the selection to the middle of the screen? Here is falls flat on its face and doesn't work at all. I've tinkered a bit with it but despite coming close, I can't get the selection dead center (plus or minus a line) of the active window. I've tested with single line of text selected and multiple lines. Have you got a better idea?


Sub SelectionScrollIntoMiddleOfView()
Dim lngLeft As Long, lngTop As Long, lngWidth As Long, lngHeight As Long
Dim lngWinTop As Long, lngWinHgt As Long
Dim lngScroll As Integer
Dim lngCenter As Long, lngWinCenter As Long
Dim lngCounter As Long
lngWinHgt = PixelsToPoints(ActiveWindow.Height, True)
lngWinTop = PixelsToPoints(ActiveWindow.Top, True)
lngWinCenter = (lngWinTop + lngWinHgt) / 2
ActiveWindow.ScrollIntoView Selection.Range
ActiveWindow.GetPoint lngLeft, lngTop, lngWidth, lngHeight, Selection.Range
lngCenter = (lngTop + lngHeight) / 2
Do
lngScroll = Sgn(lngCenter - lngWinCenter)
ActiveWindow.SmallScroll lngScroll
lngCounter = lngCounter + 1
ActiveWindow.GetPoint lngLeft, lngTop, lngWidth, lngHeight, Selection.Range
lngCenter = (lngTop + lngHeight) / 2
If lngScroll > 0 Then
If lngCenter > lngWinCenter Then
If lngCenter - lngWinCenter < 10 Then Exit Do
Else
If lngWinCenter - lngCenter < 10 Then Exit Do
End If
Else
If lngCenter > lngWinCenter Then
If lngWinCenter - lngCenter < 10 Then Exit Do
Else
If lngWinCenter - lngCenter < 10 Then Exit Do
End If
End If
If lngCounter > 20 Then Exit Do
Loop
lbl_Exit:
Exit Sub
End Sub

macropod
12-15-2020, 04:39 PM
Hi Greg,

I seriously doubt vertically centering the selection on the screen is advisable; that risks placing it immediately behind the message box. To my mind, adding the following two lines after '.Select' in my code is sufficient:

ActiveWindow.LargeScroll Down:=1
ActiveWindow.ScrollIntoView Selection.Range
That in invariably places the selected range just above the message box, which seems pretty close to optimal to me.

JPG
12-16-2020, 04:19 AM
Hi Greg,

I seriously doubt vertically centering the selection on the screen is advisable; that risks placing it immediately behind the message box. To my mind, adding the following two lines after '.Select' in my code is sufficient:

ActiveWindow.LargeScroll Down:=1
ActiveWindow.ScrollIntoView Selection.Range
That in invariably places the selected range just above the message box, which seems pretty close to optimal to me.
Good point Paul, your solution is 'close to optimal'.
The other method does have some more control over where the text will scroll to as you will have discerned in the code. As it is I have Word on one side of the screen and so this is not an issue. If I had it filling the whole screen I would change the code to have it display more at the top, so in just the two places I have put 12 instead of 2. Another tip is to double click the gap between pages as it allows the pages to flow better

Direction = Sgn((pTop + pHeight / 2) - (wTop + wHeight / 12)) Do While Sgn((pTop + pHeight / 2) - (wTop + wHeight / 12)) = Direction And (lTop <> pTop)


Jon