PDA

View Full Version : Loop through doc, delete specific characters



Zack Barresse
11-17-2006, 10:30 AM
Okay, I'm looking to make my life a little easier, I do not know if it is possible or not. I have a document I am 'refurbishing' and I am applying styles to. I can't apply the styles at the moment because there are many additional returns and tabs in the document. Where bullets and numberings should have been, somebody manually tabbed over and typed "A. ", etc. I want to delete all of this stuff and make it 'right'. I normally wouldn't want to do this, but we will be adding/subtracting from these documents in the future, so I want them easier to work with, user friendly and just plain done right.

I have the document, one of them (there are 5 total right now, more to follow), and can upload it if desired. I can also upload a small sample of it (the current document I'm on is 90 some odd pages). I wanted to know if there was a way I could loop through the document and delete certain characters. Things I want deleted:

Carrige returns with no data on that line
Tab characters
Manual page breaks
Any bullets or numberings (they're all manually entered)

I know these may not be possible. I don't know how you would tell of the bullets and numberings. Maybe check for "A. ", "B. ", "1. " or "2. "?? Something like that (check) should suffice - I would think.

Please let me know if you want a sample file, or a sample of text posted to this thread, or even if I'm fooling myself. I don't know the limits and capabilities of Word's OM. :dunno

Zack Barresse
11-17-2006, 11:16 AM
My other recent post has to deal with a specific issue on these documents. A sample of the documents is on this thread, one before editing and the other after editing. Here is the link: http://www.vbaexpress.com/forum/showthread.php?t=10182

mdmackillop
11-17-2006, 11:31 AM
Hi Zack,
I've done a lot of this over the years
Have a look at Edit/Replace/More/Special for a list of the special characters.
You typically want to replace multiple para marks so use
Find ^p^p
Replace ^p
or variation thereof
also
^t^t with ^t
^t^p with ^p for tabs with nothing after them.

For letter bullets try
^p^$.^t with ^p

for number bullets, best start with multiple digits and work down
^p^#^#.^t with ^p
^p^#.^t with ^p

and so on

mdmackillop
11-17-2006, 11:33 AM
BTW,
Save it frequently for when it goes wrong! You don't want to start again.

Zack Barresse
11-17-2006, 11:45 AM
I have no idea what you are saying, but I'll look through the Find/Replace and try it out. I'll let you know the results here shortly. :yes

Zack Barresse
11-17-2006, 11:53 AM
Wow! That took care of 75% of the document!!! I LOVE that feature!!! :D

Thanks Malcolm!!

mdmackillop
11-17-2006, 11:53 AM
Try this on your sample. Not perfect, but a start

Sub Macro1()
'
' Macro1 Macro
' Macro recorded 17/11/06 by malcolm
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^$.^t"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
ActiveWindow.ActivePane.VerticalPercentScrolled = 11
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^t^$.^t"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^t^#^#.^t"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^t^#.^t"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^t^t"
.Replacement.Text = "^t"
.Forward = True
.Wrap = wdFindContinue
.Format = False
End With

Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^t^#. "
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^t^# "
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^t^p"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^$. "
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p "
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^t"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
.Text = "^p^? ^t"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

fumei
11-17-2006, 11:54 AM
Zack, post a sample file. Here is a start though.

Carrige returns with no data on that line:
Manual page breaks.
Sub StartCleanUp()
Dim oPara As Word.Paragraph
' remove empty paragraphs
For Each oPara In ActiveDocument.Paragraphs
Select Case Len(oPara.Range.Text)
Case 1 ' just the paragraph mark
oPara.Range.Delete
Case 2 ' if empty paragraph preceded by page break
If Left(oPara.Range.Text, 1) = Chr(12) Then
oPara.Range.Delete
End If
End Select
Next
' remove manual page breaks
Selection.HomeKey Unit:=wdStory
With Selection.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "^m"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Execute Replace:=wdReplaceAll
End With
End Sub

mdmackillop
11-17-2006, 11:59 AM
Gerry,
Zack has a sample in his (only?) other Word thread.

Zack Barresse
11-17-2006, 12:06 PM
Yes, sorry about the confusion here. I thought they (my two posts) were separate issues and posted them separately. In fact, I still have the other problem. Getting a solution to that would speed up this 'document conversion' very quickly. I've already trimmed 3 days of work with what has been covered in this thread.

Gerry: the link in post #2 is to the other thread where the file URL is located.

Thank you very much both of you! I'm learning a lot here, and seeing how useful some of the rudimentary - and native - features of Word can be. :yes

mdmackillop
11-17-2006, 12:22 PM
Hey Zack,
I love the minimalist approach to website design.

mdmackillop
11-17-2006, 03:55 PM
Hi Zack,
I thought I'd have a go at tidying this code up and make it a bit more "user friendly". Due to text changes, it seems best to run it twice.

Option Explicit
Sub CleanUp()
Dim Repeats As Long
Dim FaR
Dim i As Long
ReDim FaR(2)

'Using wildcards
'Set repeated items value
Repeats = 10
'Multiple spaces
FaR(0) = " "
'Multiple tabs
FaR(1) = "^t"
For i = 0 To 1
With selection.Find
.Text = FaR(i) & "{2," & Repeats & "}"
.Replacement.Text = FaR(i)
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = True
End With
selection.Find.Execute Replace:=wdReplaceAll
Next

'No wildcards
ReDim FaR(12, 1)
'Multiple Para
FaR(1, 0) = "^p^p"
FaR(1, 1) = "^p"
'Tab Para
FaR(2, 0) = "^t^p"
FaR(2, 1) = "^p"
'Start of line space
FaR(3, 0) = "^p "
FaR(3, 1) = "^p"
'End of line space
FaR(4, 0) = " ^p "
FaR(4, 1) = "^p"
'Numbering with period
FaR(5, 0) = "^p^?^?.^t"
FaR(5, 1) = "^p"
'Numbering with period
FaR(6, 0) = "^p^?.^t"
FaR(6, 1) = "^p"
'Start of line tab
FaR(7, 0) = "^p^t"
FaR(7, 1) = "^p"
'Start of line number
FaR(8, 0) = "^p^# "
FaR(8, 1) = "^p"
'Start of line number with period
FaR(9, 0) = "^p^#. "
FaR(9, 1) = "^p"
'Start of line letter
FaR(10, 0) = "^p^$ "
FaR(10, 1) = "^p"
'Start of line letter with period
FaR(11, 0) = "^p^$. "
FaR(11, 1) = "^p"
'Tab Para
FaR(12, 0) = "^t^p"
FaR(12, 1) = "^p"
For i = 1 To 12
selection.HomeKey Unit:=wdStory
With selection.Find
.Text = FaR(i, 0)
.Replacement.Text = FaR(i, 1)
.Forward = True
.Wrap = wdFindContinue
.MatchWildcards = False
End With
Do Until selection.Find.Execute = False
selection.Find.Execute Replace:=wdReplaceAll
Loop
Next
End Sub