PDA

View Full Version : Solved: Search/Replace mixed Regular and Italic



Paul_Hossler
04-04-2009, 03:52 AM
Many times when reformatting a scanned book or article (depends on quality and typeface I think), the OCR process will take 2 words and combine them into a single word with the second one being formatted in italics:

"this is regular italics but ...." will OCR as

"this is regularitalics but ...."

I tried all kinds of search and replace with formatting and wild cards, etc. but couldn't get it to only look for italics in part of the search term.

I think I'll have to brute force through the text, one word at a time.

Can some one get me started with a macro that will:

1. Find each word in regular and italics (I think .Find will be faster than looping through the text one word at a time)

2. Splt the .Found word by adding the missing space

3. Making the second word Regular

Would save a LOT of work using Spell check, and manually fixing

Paul

macropod
04-04-2009, 03:36 PM
Hi Paul,

Using wildcards for Find/Replace, you should be able to achieve what you're after with:
. '(*>)' as the 'Find' text, with italic character formatting, and
. ' \1' as the 'Replacement' text (not the space before the \), with regular character formatting.

Paul_Hossler
04-04-2009, 06:15 PM
Yep -- that works

I was trying it the hard way and making it a whole lot more complicated.

Thanks

Paul

Edit----

Uh-Oh -- found a hopefully small problem

If there is a complete word that is italic, the pattern matchs that also. Some things (e.g. Book titles) should remain in italic.

It's only when one word is a mixture of regular and italics that I need need to break the joined words into two words, and make the second one regular

e.g. "regularitalic" should become "regular italic"

Do-able??

Paul_Hossler
04-05-2009, 06:55 AM
Could not figure out how to "Un-solve" a thread:dunno and I was afraid my follow up question would be missed by the people who could help

Original:



Many times when reformatting a scanned book or article (depends on quality and typeface I think), the OCR process will take 2 words and combine them into a single word with the second one being formatted in italics:

"this is regular italics but ...." will OCR as

"this is regularitalics but ...."

I tried all kinds of search and replace with formatting and wild cards, etc. but couldn't get it to only look for italics in part of the search term.

I think I'll have to brute force through the text, one word at a time.

Can some one get me started with a macro that will:

1. Find each word in regular and italics (I think .Find will be faster than looping through the text one word at a time)

2. Splt the .Found word by adding the missing space

3. Making the second word Regular

Would save a LOT of work using Spell check, and manually fixing

Paul



Macropod's answer



Using wildcards for Find/Replace, you should be able to achieve what you're after with:
. '(*>)' as the 'Find' text, with italic character formatting, and
. ' \1' as the 'Replacement' text (not the space before the \), with regular character formatting.


Found a hopefully small problem. Added the S&R to my 'cleanup' macro, and it does correctly handle the example text.

If there is a complete word that is italic, the pattern matchs that also. Some things (e.g. Book titles) should remain in italic.

It's only when one word is a mixture of regular and italics that I need need to break the joined words into two words, and make the second one regular

e.g. "regularitalic" should become "regular italic", but

"regular italic" should remain "regular italic",

Thanks again, and sorry for any confusion and the "Requirements Creep"

Paulk

lucas
04-05-2009, 07:57 AM
Paul, you can use the thread tools at the top of the page to edit your thread title.

"Solved" removed from title and threads merged to avoid confusion.

Paul, I can't seem to get this to work on mult finds to replace the font but it works with the messagbox. Maybe a starting point for your problem.

Dim aRange As Range
Set aRange = ActiveDocument.Range
With aRange.Find
.Font.Italic = True
Do
.Execute
If .Found Then
MsgBox aRange
' aRange.Font.Italic = False
End If
Loop While .Found
End With

Paul_Hossler
04-05-2009, 08:22 AM
Paul, you can use the thread tools at the top of the page to edit your thread title.

Does removing the 'Solved:' open it back up again? I was looking for a check box under 'Tools' to uncheck.



"Solved" removed from title and threads merged to avoid confusion.
Thanks, I was afraid that no one would bother to look at a "Solved:" thread



Maybe a starting point for your problem.

Thanks for that also -- the stumbling block seems to be the fact that what looks like a single MS Word word is a mixture of regular and italics, and it's only the italics part that requires processing

Paul

lucas
04-05-2009, 09:18 AM
Paul, see if this works for your need. I'm not much of a Word person but this seems to work:

Option Explicit
Sub FindItalicsRemoveAndAddSpace()
Dim aDoc As Document
Dim AllRng As Range
Dim SrchRng As Range
Dim Response As Integer
Set aDoc = ActiveDocument
Set AllRng = aDoc.Range
Set SrchRng = AllRng.Duplicate
Do
With SrchRng.Find
.ClearFormatting
.Font.Italic = True
.Execute
End With
If SrchRng.Find.Found Then
SrchRng.MoveEnd wdWord, 1
'Ask for response
Response = MsgBox("Remove the italics? " & vbCr & _
SrchRng, vbYesNo, "Please Respond")
Select Case Response
Case 6
'They clicked yes.
SrchRng.Font.Italic = False
SrchRng.Text = Replace(SrchRng.Text, SrchRng.Text, " " & SrchRng.Text)
End Select

SrchRng.Start = SrchRng.End + 1
SrchRng.End = AllRng.End
End If
Loop Until Not SrchRng.Find.Found
End Sub

Paul_Hossler
04-05-2009, 10:56 AM
lucas -- your's works fine, or at least until I made just a small change:rotlaugh:

I tried to automate the reformat by testing the len of the .Found text against thelen of the whole word that the text is in

This was goodmen would correctly become good men (3 < 7)

while A Book Title would remain A Book Title (12 not < 12)

I was able to actually create a new Range for the Entire Word, and the len's seem to be correct, but (as I'm still trying to fathom the Word Range object, Selection, etc.) I seem to have the entire word in SrchRng, and so

goodmen becomes <extra space>goodmen

i.e. the space is added at the beginning, and not where the italiczed part begins.

Could you take a look and let me know what I'm doing wrong?


Sub FindItalicsRemoveAndAddSpace_1()
Dim aDoc As Document
Dim AllRng As Range
Dim SrchRng As Range
Dim WordRng As Range
Dim Response As Integer, iFound As Long, iWord As Long

Set aDoc = ActiveDocument
Set AllRng = aDoc.Range
Set SrchRng = AllRng.Duplicate

Do
With SrchRng.Find
.ClearFormatting
.Font.Italic = True
.Execute
End With

If SrchRng.Find.Found Then
SrchRng.MoveEnd wdWord, 1
iFound = Len(SrchRng)

Set WordRng = SrchRng
WordRng.StartOf Unit:=wdWord, Extend:=wdExtend
iWord = Len(WordRng)


'got stuck trying to automate it
'if len (SrchRng) < len (WordRnd) then
' replace SrchRng.Text, SrchRng.Text, " " & SrchRng.Text)
'but SrchRng becomes the whole word, and the Replace doesn't do it correctly

If iFound < iWord Then
SrchRng.Font.Italic = False
SrchRng.Text = Replace(SrchRng.Text, SrchRng.Text, " " & SrchRng.Text)
End If


SrchRng.Start = SrchRng.End + 1
SrchRng.End = AllRng.End
End If
Loop Until Not SrchRng.Find.Found
End Sub


Thanks

Paul

Paul_Hossler
04-06-2009, 06:33 PM
Success -- thanks to lucas and macropod :beerchug:

And I learned something to boot.

Just had to make a small change


Set WordRng = SrchRng.Duplicate



Paul