PDA

View Full Version : Solved: Validating mixed font strings



gmaxey
08-18-2010, 06:07 PM
I have a very large manuscript with thousands of sentences that have a very specific sentence reference number format and spacing requirement.

E.g.,

End of one sentence(normal period)(normal space)(superscript space and reference number)(superscript non-breaking space)Start of next sentence.

Unfortunately over the many years and many editors there are hundreds of spacing and formating errors throughout the manuscript. One thing that is relatively stable is the end of sentence marks (.!?) are normal script and the reference is superscript.

There are many places where the normal sentence stop (period, question mark, exclamation point) is butted up to the superscript reference number with no spacing:

End of sentece.(superscript number)Start of next sentence.

I have worked out the a process that resolves all the issues that I have found. However the one required to find and the example above and place in one space (which I resolve to one normal and one superscript later) is very clunky due to the mixed script and the presence of many other superscript text instances in paragraphs like the paragraph reference format: T23.1.2 [23] (T23.1.1) 1 Start of first sentence paragraph.

Here is the code that I am using to find these butted up instances (no space) and add one space.

Sub FixNoSpaceBefore(ByRef oRngFix As Word.Range)
Dim oRngProcess As Word.Range
Set oRngProcess = oRngFix.Duplicate
With oRngProcess.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "[.\?\!,;][0-9]"
.MatchWildcards = True
.Wrap = False
While .Execute
With oRngProcess
If .Characters.First.Font.Superscript = False And .Characters.Last.Font.Superscript = True Then
.Text = .Characters.First & " " & .Characters.Last
.MoveStart wdCharacter, 1
.Font.Superscript = True
.Collapse wdCollapseEnd
End If
End With
Wend
End With
End Sub

The problem of course is that this process is evaluating "all" instances of period(no space)number and manipulating the text where required rather than finding only those instances where the period is normal script and the number is superscript.

I hope I have provided enough detail for someone to see the issue. I feel like I have been staring into this forrest for so long that I might be missing some far easier or efficient method.

Thanks.

fumei
08-19-2010, 12:09 AM
You may be in trouble...

But first, I have to ask: superscript non-breaking space

WHY do you have a non-breaking space as a superscript?


















Oh. Never mind. I just thought of a reason. It may not be your reason, but in fact that is irrelevant, as ANY reason is good enough to have the situation to deal with.


Hmmmmm. I am in the midst of hiking the California Redwoods and darn it this is distracting!!!! Now I have something annoying to think about.


Hmmmm.

fumei
08-19-2010, 12:15 AM
" is very clunky "

Greg, I suspect, because of the existence of OTHER superscript NOT relevant to actionable logic (but present nonetheless, so you MUST deal with them with some logic), that you may be stuck with your kludgey and clunky solution. Although it is not all THAT clunky.

I shall stare at a 380 foot tree and think about it.

fumei
08-19-2010, 12:20 AM
Oh, and:

"Unfortunately over the many years and many editors there are hundreds of spacing and formating errors throughout the manuscript. "

Cracked me up. Really????? I am shocked!

OK, though. I still do have to ask why you have sentences separated by non-breaking spaces. This is not a dispute. I have worked with technical documentation with some VERY odd requirements. It is curiosity.

fumei
08-19-2010, 12:46 AM
What are you passing in ByRef?

(ByRef oRngFix As Word.Range)

You are using a Duplicate, but what is the range you are passing in?

Anyway, could it be possibly easier to test for the superscripts themselves?
Sub Hmmmmmmmmmmmm()
Dim r As Range
Set r = ActiveDocument.Range
With r.Find
.Text = ""
.Font.Superscript = True
Do While .Execute(Forward:=True) = True
' if previous chanaracter is sentence punctuation
' note only using periods for this test
If r.Previous.Characters(1) = "." Then
' add a space
r.InsertBefore " "
End If
If Asc(r.Next.Characters(1)) >= 65 And _
Asc(r.Next.Characters(1)) <= 90 Then
r.InsertAfter " "
End If
' IMPORTANT!! there is no error trap to
' see if the NEXT character is also superscript
' I.E. not single digit superscript
' This is not stated, but if it is a large document
' this would not surprise me
' ...therefore it should be tested.

Loop
End With
End Sub
Now, if the superscript vis-a-vis the sentences themselves are correct, with appropriate spaces (I am ignoring your superscript non-breaking for the moment), any further processing could be handled by Sentence objects.

Just a thought.

gmaxey
08-19-2010, 05:00 AM
Gerry,

The sentence reference numbers are at the beginning of each sentence. The non-breaking superscript space prevents the sentence reference number from being orphaned at the end of a line.

T23.1.2[44] (T22.1:V6) 1 First sentence. 2 Second sentence ....

All numbers in the above are superscript.

I'll look at your code and see how it performs.

gmaxey
08-19-2010, 06:00 AM
Gerry,

Thanks for the suggestion. I am not sure if it actually running any faster but this works and looks better for the "the space" before issue:

Sub FixNoSpaceBefore(ByRef oRngFix As Word.Range)
On Error GoTo Err_Handler
With oRngFix.Find
.Text = ""
.Font.Superscript = True
Do While .Execute(Forward:=True) = True
oRngFix.Select
Select Case oRngFix.Previous.Characters(1)
Case ".", "!", "?", ",", ";"
oRngFix.InsertBefore " "
End Select
Err_Resume:
Loop
End With
Exit Sub
Err_Handler:
Resume Err_Resume
End Sub


I discovered something that I should (and did actually) have already known. There are so many different formatting issues in the manuscript that I was attempting to separate out and pass only document paragraphs that contained the paragraph reference to separate routines for processing. The idea being to try to prevent steps that fixed one thing from breaking several others.

I noticed in testing that "it appeared" the process was slow at start and then started building up speed as it neared the end. The range that Iwas passing was a paragraph range. I realize now that in passing the first paragraph my proceedure was processing it and all subsequent paragraphs. The passing procedure then passed the second and it was processed along with following paragraphs.

I ran the code by simply passing the ActiveDocument range and of course it was completed in a fraction of time with only minor other errors.

I don't remember exactly how to prevent a Find and Replace procedure from continuing to process after it has processed the passed range. I could be and very well am wrong, but I don't think you can if you use the .Execute Replace:=WdReplaceAll. It seerms that if using a Do While .Execute that you can jump out of the loop if the found rng is outside the passed range.

Something like If Not oRngFind.InRange(PassedRange) then Exit Loop

I'll have to tinker with that this evening.

Thanks.

gmaxey
08-19-2010, 02:53 PM
Gerry,

I don't think that I could have been more wrong concerning what can and can not be done with find and replace. Well maybe a little.

Can you accept the excuse that I am a little rusty with F&R and was blurry eyed from staring at the manuscript text and its gazillion formatring errors?

For the benefit of others that might stumble on this thread and to completely dispel and notion that it isn't possible to pass a specific range to F&R procedure and limit processing to that range, here are some working examples.

I typed six paragraphs:

Test, test, test
Test, test, test
Test, test, test
Test, test, test
Test, test, test

And ran these procedures (resetting the text after each)

Option Explicit
Dim i As Long
Sub ScratchMacro()
For i = 1 To 5 Step 2
FindReplaceAllInDefinedRange ActiveDocument.Paragraphs(i).Range
Next i
End Sub
Sub FindReplaceAllInDefinedRange(ByRef oRng As Word.Range)
With oRng.Find
.Text = "Test"
.Replacement.Text = "Eureaka"
.Execute Replace:=wdReplaceAll
End With
End Sub
Sub ScratchMacroII()
For i = 1 To 5 Step 2
FindAndProcessInDefinedRange ActiveDocument.Paragraphs(i).Range, i
Next i
End Sub
Sub FindAndProcessInDefinedRange(ByRef oRng As Word.Range, Count As Long)
Dim oSrchRng As Word.Range
Set oSrchRng = oRng.Duplicate
With oSrchRng.Find
.Text = "Test"
Do While .Execute
If oSrchRng.InRange(oRng) Then
Select Case Count
Case 1
With oSrchRng.Characters(1).Font
.Superscript = True
.Color = wdColorBlue
.Size = Count + 8
End With
Case 3
With oSrchRng.Characters(2).Font
.Subscript = True
.Color = wdColorGreen
.Size = Count + 12
End With
Case 5
With oSrchRng.Characters(2).Font
.Underline = True
.Color = wdColorRed
.Size = Count + 16
End With
End Select
oSrchRng.Collapse wdCollapseEnd
Else
Exit Do
End If
Loop
End With
End Sub
Sub ScratchMacroIII()
For i = 1 To 5 Step 2
FindAndProcessOneByOneInDefinedRange ActiveDocument.Paragraphs(i).Range, i
Next i
End Sub
Sub FindAndProcessOneByOneInDefinedRange(ByRef oRng As Word.Range, Count As Long)
Dim oSrchRng As Word.Range
Dim i As Long
i = 0
Set oSrchRng = oRng.Duplicate
With oSrchRng.Find
.Text = "Test"
.Replacement.Text = "Eureaka"
Do While .Execute(Replace:=wdReplaceOne)
i = i + 1
If i = 2 Then
MsgBox "That's enough"
Exit Do
End If
oSrchRng.Collapse wdCollapseEnd
Loop
End With
End Sub


Thanks for your push along a different path. It got me to dig deeper into my code where I realized that I was processing paragraphs 1 to 300, 2 to 300, 3 To 300 ... 299 To 300 instead of paragraph 1, 2, 3, 4 ... 300.

The whole thing is still a goat rope as the author continues to throw out new formattng cases, but at least it now runs smoother and faster.

fumei
08-19-2010, 10:27 PM
"Thanks for your push along a different path. It got me to dig deeper into my code where I realized that I was processing paragraphs 1 to 300, 2 to 300, 3 To 300 ... 299 To 300 instead of paragraph 1, 2, 3, 4 ... 300.
"

Aaaaacccccccccccccccckkkkkkkkkkkkkkkk!!

No kidding!

Ran into a bear today....almost literally.