PDA

View Full Version : [SOLVED:] Turning ALL CAPS text into lower case/sentence case text with character style applied



Walentyn
02-02-2021, 01:31 PM
We produce etext files for blind/low vision folk, and the convention is to replace all caps words with lower case and a bold character style. Some words have to be in title case where they start a sentence (or are names). This can be tedious to do manually for young adult novels that are full of "shouting" text.

I have a macro which performs a series of Find and Replaces to change upper case text into lower case, with a bolding character style applied to it, a style called "Emphasis".

The last action of the macro is to find sentences whose first word now starts with a lower case letter, and turn it into an upper case letter.

I've got some code to do this:


For i = 1 To docFind.Sentences.count
If docFind.Sentences(i).Characters.First.Bold = True Then
docFind.Sentences(i).Characters(1).Case = wdUpperCase
docFind.Sentences(i).Characters(1).HighlightColorIndex = wdPink
End If
Next i


Two issues with this way of doing things:

1. For a long document, it's extremely slow. The full code loops through all sentences, more than once.
2. It's even slower when additional lines of code deal with the cases where the sentence starts with a quotation mark, and the second character is lower case.

So the question is:

Rather than loop through potentially thousands of sentences, how to write code to restrict the search to sentences whose first word is in Emphasis style (or in bold), and whose first letter is lower case?

gmayor
02-03-2021, 05:34 AM
Without access to a sample document it is difficult to make suggestions on how to address the issue, but on the face of it if you have a document full of all caps words then why not simply convert the document to sentence case? It will not address proper names, but for the rest it should be virtually instantaneous, leaving fewer replacements to worry about. You can do that from a list of keywords - see https://www.gmayor.com/document_batch_processes.htm (http://www.gmayor.com/document_batch_processes.htm)


Dim oRng As Range Set oRng = ActiveDocument.Range
oRng.Case = wdTitleSentence

Walentyn
02-03-2021, 11:48 AM
Here are some examples, to make the situation clearer. This sort of thing is littered throughout a novel-length document. The macro aims to transform the caps text into lower case bold using a named style, and restores the start of sentences to title case.

"THAT'S IT!" he cried.
"Oh, NO!" screamed Fluffy.
Um ... boys and girls ... BOYS AND GIRLS!
"Children, PLEASE STOP LAUGHING!"
STOP IT RIGHT NOW!
"WOW!" said Connor. "Science is COOL!"
"I said NO. A BIG FAT NO!"
"GGGGLLUUZZZZZZZRRRRT!"
"Tra-la-LAAAAA!" sang the hero.
"DON'T!" Fred shouted.

My macro handles a list like this well enough, but not a full length document.

macropod
02-03-2021, 03:11 PM
For example:

Sub Demo()Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "<[A-Z][A-Z' \!\?,.:;^t^l^13]{2,}"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWildcards = True
End With
Do While .Find.Execute
Do While .Characters.Last Like "[\!\?,.:;" & vbTab & vbCr & Chr(11) & "]"
.End = .End - 1
Loop
.Case = wdTitleWord
.Style = wdStyleStrong
.Collapse wdCollapseEnd
Loop
End With
Application.ScreenUpdating = True
End Sub
Word's 'Strong' character Style applies bold formatting. Rather than turning everything into lower-case, which would make sentence structures harder to follow, or to sentence case, which would lose the capitalisation for names, etc., I've chosen a compromise by applying Word's 'Title' case, which capitalises the first letter of each word. You can choose a different format if you prefer.

Walentyn
02-03-2021, 04:23 PM
That's a great advance on what I had. I changed the case to wdTitleSentence, and then added a second Find and Replace to fix the lower case sentences. (The highlights are temporary measures to show me cause and effect.) That gave me 99% of what I wanted, at great speed.

The only case that was wrongly captured that I've so far noticed is when the last word in a sentence is all caps. The capital letter of the following sentence was included.

Removing the full stop from the Find text fixed that, but introduced an exclusion: a two letter word like "NO": "I said NO. ETC ..." This was left untouched.


Sub FormatCapsStringToEmphasis_new()

Application.ScreenUpdating = False
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "<[A-Z][A-Z' \!\?:;^t^l^13]{2,}" ' I removed the full stop and comma that were here
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWildcards = True
End With
Do While .Find.Execute
Do While .Characters.Last Like "[\!\?:;.," & vbTab & vbCr & Chr(11) & "]"
.End = .End - 1
Loop
'.Case = wdTitleWord
.Case = wdTitleSentence
.Style = ActiveDocument.Styles("Emphasis")
.HighlightColorIndex = wdPink
.Collapse wdCollapseEnd
Loop
End With

Selection.HomeKey Unit:=wdStory

With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "(^13^34[a-z]{1,})" '
.Replacement.Text = "^&"
.Replacement.Highlight = True
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
Selection.Find.Execute Replace:=wdReplaceAll

Do Until .Execute = False
With Selection
.Start = .Start + 2
.Range.Case = wdTitleWord
.Range.HighlightColorIndex = wdYellow
.Range.Style = ActiveDocument.Styles("Emphasis")
.Collapse wdCollapseEnd
End With
Loop
End With
Selection.HomeKey Unit:=wdStory
Application.ScreenUpdating = True
End Sub

macropod
02-03-2021, 05:16 PM
Try:

Sub FormatCapsStringToEmphasis()
Application.ScreenUpdating = False
Dim Rng As Range, i As Long
With ActiveDocument
Set Rng = .Range(0, 0)
Set Rng = Rng.GoTo(What:=wdGoToPage, Name:=4)
Rng.End = .Range.End
With Rng
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "<[A-Z][A-Z '\!\?.,:;]{2,}"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWildcards = True
End With
Do While .Find.Execute
If .Paragraphs.Count > 1 Then .End = .Paragraphs.First.Range.End
If .End = .Paragraphs.First.Range.End Then .End = .End - 1
.MoveEndWhile "[ \!\?.,:;]", wdBackward
If .Characters.Last.Next Like "[a-z]" Then .End = .End - 1
.MoveEndWhile "[ \!\?.,:;]", wdBackward: i = 0
Select Case Split(.Text, " ")(UBound(Split(.Text, " ")))
Case "I", "A", "OK", "O.K", "ADD", "A.D.D", "ADHD", "A.D.H.D", "UFO", "U.F.O", "US", "U.S", "USA", "U.S.A"
i = Len(Split(.Text, " ")(UBound(Split(.Text, " ")))) + 1
.End = .End - i
End Select
If Len(.Text) > 1 Then
.Style = wdStyleEmphasis
.HighlightColorIndex = wdPink
.Case = wdTitleSentence
If .Characters.First.Previous Like "[""" & vbCr & Chr(11) & "]" Then .Words.First.Case = wdTitleWord
Select Case LCase(Split(.Text, " ")(UBound(Split(.Text, " "))))
Case "alabama", "ohio"
.Words.Last.Case = wdTitleWord
End Select
End If
.End = .End + i
.Collapse wdCollapseEnd
Loop
End With
End With
Application.ScreenUpdating = True
End Sub

Walentyn
02-03-2021, 06:18 PM
That gives another unwanted consequence, which I can resolve by removing the square brackets around [^13""""] but this leaves untouched the first of two issues mentioned above, and loses the restoration of sentence case. But don't want to take up any more of your time. The most comprehensive version is still the two part sub posted above.

macropod
02-03-2021, 06:31 PM
Perhaps you could attach a document to a post demonstrating the content you're having issues with.

Regardless, the code you added in post #5 is markedly less efficient than the code I've posted.

Walentyn
02-04-2021, 12:24 PM
I'm attaching a demo file. One page 3 of the file, labelled "Page 1", heading called "Dummy text:" I've added a bunch of fake text as examples. Others are scattered throughout.

What we call Prelim pages, comprising Publishing Info, Cover Information, Contents etc, will be excluded from the range of this macro.

macropod
02-04-2021, 09:36 PM
See updated code in post #6. You'll note it now:
1. starts searching at page 4
2. has a series of upper-case strings that, if they're the last (or only) word in the matched string, don't get processed (you can add more)
3. has a series of lower-case strings that, if they're the last (or only) word in the matched string, get converted to title case (you can add more)

Obviously, this isn't going to cater for every possibility, but it'll get you a whole lot closer to your goal than you have been. After running the macro in post #6, you may want to run another that Find/Replace that re-capitalises names, etc. in the document.

Walentyn
02-08-2021, 11:59 AM
Gosh, what can I say ... genius! A quite different beast than the previous version(s), and very fast. Much to learn from it too. Thanks so much for taking the time to figure this out! It's a great help.