PDA

View Full Version : Extra paragraph marks ruin all scripts



Underwood
12-17-2016, 06:03 PM
Hello all,

Problem:
- Extra paragraph marks ruin all scripts below.

Need:
- Word selection to extend to TWO consecutive paragraph marks

17850




Sub Highlight_WORDLINE_v2()
Dim oRng As range
Set oRng = ActiveDocument.range
With oRng.Find
Do While .Execute(FindText:="edoxaban")
oRng.Start = oRng.Paragraphs(1).range.Start
oRng.End = oRng.Paragraphs(1).range.End
oRng.HighlightColorIndex = wdYellow
oRng.Collapse 0
Loop
End With
lbl_Exit:
Set oRng = Nothing
Exit Sub
End Sub

SamT
12-17-2016, 08:49 PM
Have you tried SpaceAfter?

I don't script Word, but in other Domains, a Space_After Tag would do what you need.



If I understand the Tags in the "BAD" image,
Then, If you can Search and Replace:

Replace <Para>LineFeed<Para>
with <Para>"XYZ"<Para>
Then
Replace <Para>LineFeed
With SpaceAfter
Then
Replace <Para>"XYZ"<Para>
With <Para>LineFeed<Para>

Underwood
12-17-2016, 09:07 PM
Google is showing me that "Space_After" is for python.

SamT
12-17-2016, 09:21 PM
You are too fast for me at this time of night.

I was editing my last post while you were replying.

Underwood
12-17-2016, 09:35 PM
Perhaps I could do that, but I'm unsure of how the syntax would be.
Admittedly, I am a VBA noob.

SamT
12-17-2016, 09:39 PM
Can you record Macros in Word? I know you can in Excel. I just don't know if you can search for para marks and LineFeeds, or, Carriage Returns.

gmayor
12-17-2016, 10:29 PM
Looking at your two example texts, as you are probably aware, the first comprises one paragraph, the second comprises three. There is no way for a macro to determine whether a block of text comprises one paragraph or several unless you tell the macro what constitute the start and end points in the block. Your examples are too brief to be certain, but both end with Grade then a number then a closing parenthesis. If all the blocks you want to treat in this way end with that layout, then you could use that to determine where to end the block.

See also http://word.mvps.org/FAQs/Formatting/CleanWebText.htm (note that this link will soon become unavailable)

Underwood
12-17-2016, 10:51 PM
gmayor -

Could something like this work?:

If Strings.Right(Selection.Range.Text, 1) = "^13^13"

gmayor
12-18-2016, 02:28 AM
Your example text doesn't have two consecutive paragraph breaks. It has unwanted paragraph breaks within the paragraph.

Maybe something like:


Dim orng As Range
Set orng = Selection.Range
With orng
'move the start of the range to the start of the current paragraph
.Start = orng.Paragraphs(1).Range.Start
'move the end of the range to the parenthesis. Obviously this won't be any use if you have multiple parentheses
.MoveEndUntil "("
'Move the end of the range to the end of the last paragraph in the range less the final paragraph break
.End = .Paragraphs.Last.Range.End - 1
'Remove any paragraph breaks in the range. This will remove any manual formatting from the range.
.Text = Replace(.Text, Chr(13), "")
End With

will work for the paragraph the cursor is in

mikewi
12-18-2016, 06:59 AM
This is what I wrote to get rid of the extra returns after opening a PDF in Word.


Sub removereturn()
'
' removereturn Macro
' removereturn
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "([!.])^0013"
.Replacement.Text = "\1"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Paul_Hossler
12-18-2016, 08:23 AM
If that is a typical example then a brute force replace might work

Find: Letter/Digit#1 followed by paragraph followed by Letter/Digit#2
Replace: Letter/Digit#1 followed by space followed by Letter/Digit#2

It's not perfect



Option Explicit
Sub Macro2()
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "([a-zA-Z0-9])^13([a-zA-Z0-9])"
.Replacement.Text = "\1 \2"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub

SamT
12-18-2016, 10:18 AM
Graham,
There are consecutive Para Breaks. He just didn't highlight the one under the paragraph he wants highlighted. I see 5 in the RH image, 3 inside, one above and one below.

That's why I suggested Post#2

gmayor
12-18-2016, 11:23 PM
Sam
I was distracted by the highlighted texts where there was an obvious problem and overlooked the double paragraph breaks, despite a reference to them. :(
My original suggestion should work, but as the document has double paragraph breaks between the text blocks, I would use a variation on Paul's code as follows.
The double paragraph breaks should also be replaced with one and the space provided by paragraph formatting in the styles used.


Sub Macro3()
Dim oRng As Range
Set oRng = ActiveDocument.Range
With oRng.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "(^13[!^13])"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub

Underwood
12-19-2016, 03:01 AM
Gentlemen,

Flight cancellations and delays held me up from checking thread yesterday.
I will try all of these things, today.

Thank you VERY much for your input so far!

Underwood
12-19-2016, 10:57 PM
OK,
So I've looked into the problem earlier, and have identified the issue.
The OCR inserting more extra paragraph marks than I originally thought. (SEE ATTACHED IMAGE)
So my original thought of ending the selection before consecutive paragraph marks (^13^13) falls flat.

What I would like now is a script that extends the selection from the search word to the nearest period (.) on both sides.
Gmayor's code below is basically done if it based the selection to be expanded upon the search term and not the cursor.

Gmayor's script (fix to start from search term):

Dim orng As Range
Set orng = Selection.Range
With orng
'move the start of the range to the start of the current paragraph
.Start = orng.Paragraphs(1).Range.Start
'move the end of the range to the parenthesis. Obviously this won't be any use if you have multiple parentheses
.MoveEndUntil "("
'Move the end of the range to the end of the last paragraph in the range less the final paragraph break
.End = .Paragraphs.Last.Range.End - 1
'Remove any paragraph breaks in the range. This will remove any manual formatting from the range.
.Text = Replace(.Text, Chr(13), "")
End With


New Order of Execution:
- The first sweep (the script that goes paragraph to paragraph in OP) will get 90% of instances' paragraphs highlighted.
- The second sweep (new script) will get the remaining 10% that has been tampered by OCR by extending over extra empty lines.
- The other OCR correction scripts posted above will provide additional failsafes (thanks guys!).

Apologies for the extra thumbnail below - unsure of how to delete.

Kilroy
12-20-2016, 05:33 AM
17879

I typed out your example and ran a slight variation of Mikewi's code and added a bit to remove 2 spaces in a row and it works good. As you can see I'm still new at this and I don't know how to loop or DoUntil yet so the return removal is there twice. It's ugly but it does the job to the example.



Sub RunAll()
Call removereturn
Call removereturn2
Call Replace2spacesWith1
End Sub
Sub removereturn()
'
' removereturn Macro
' removereturn
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "([!.])^0013"
.Replacement.Text = "\1 "
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub removereturn2()
'
' removereturn Macro
' removereturn
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "([!.])^0013"
.Replacement.Text = "\1 "
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub Replace2spacesWith1()
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "^32{2,}"
.Replacement.Text = " "
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
End Sub

gmaxey
12-20-2016, 07:46 AM
Kilroy,

Good to see you are learning.

A few points. You don't have to use "Call"

RunThisMacro
does the same thing as ...
Call RunThisMacro

Range is usually easier to work with than selection.

Your code revised:


Sub ScratchMacro()
'A basic Word macro coded by Greg Maxey
Dim oRng As Range
Set oRng = Selection.Range
'Kill empty paragraphs.
With oRng.Find
.Text = "^13{2,}"
.Replacement.Text = vbCr
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
Set oRng = Selection.Range
With oRng.Find
'Kill paragraphs that don't end with a period.
.Text = "([!.])^0013"
.Replacement.Text = "\1 "
.MatchWildcards = True
.Execute Replace:=wdReplaceAll
End With
lbl_Exit:
Exit Sub
End Sub

Kilroy
12-20-2016, 07:57 AM
Thanks Greg. I've received a lot of help here and learned a lot as well. I will continue to try and pay it forward when ever I can. I tried your code. Works perfect in my example as usual. I've had lots of trouble with PDF to word especially when the PDF had 2 columns which is what I suspect was the problem in this thread as well. Word just adds a return in the weirdest places in the conversion. I'm working on one now that changes all to one column and sets the margins. It's long but it works. I'm going incorporate this code into it. Thanks again.

Underwood
12-20-2016, 10:06 AM
Kilroy & gmaxey:

Script is doing well, but:
It's subtracting more paragraph marks.
So it completes the task, but with collateral damage you might say.

Still would like a selection --> next/last period (.) script.

Will check in on it later.
Thanks for your continued input!
Vacation :D

Kilroy
12-21-2016, 05:28 AM
Underwood the code that Greg wrote works great on the example you gave and does only work from the cursor forward. However if you don't want it continue from the cursor to the end you can try and replace the
.Execute Replace:=wdReplaceAll in both with statements to:

.Execute Replace:=wdReplaceOne

In the example you gave you will need to run this more than one time but at least you can control how far it goes.