PDA

View Full Version : Remove text between two points



Zack Barresse
05-09-2012, 11:32 AM
Hello!

I've searched but haven't found anything quite like what I'm looking for. Let me explain my document structure, then I'll explain what I'm trying to do. I want to do this with VBA so I don't have to do it by hand.

I have about 35 files, all structured the same, all in the same folder. I'll do this to every one of the files. They range from 3 to 15 or so pages.

Structure is as follows:



TITLE<P>
<P>
Intro line<P>
<P>
1. Text here for about a paragraph.<P>
<P>
A. Text<P>
B. Text<P>
C. Text<P>
D. Text<P>
<P>
Answer: X<P>
Rationale:<P>
Text goes here for about a paragraph<P>
<P>
2. ...


As you can see this is a test. It's my answer key though, and I have no student version. What I want to do is take out the Answer and Rationale portions, leaving just the questions. Basically take everything out which starts with "Answer:" and goes through the paragraph after "Rationale:". In the example above the <P> is paragraph markers.

The numbers and letters are in numbering format.

I was thinking about doing a find/replace, but I don't know how to get the paragraph after I'm looking for something.

Can I do this with a find/replace or VBA easily?

Frosty
05-09-2012, 04:06 PM
I'm not the expert on how to do the most efficient wildcard searches, but I think a wildcard find searching for (without quotes) "Answer:*^13*^13*^13*^13" will find your Answer: paragraph, and then the next three paragraphs.

Replace with nothing, and I think you've got a replace all that doesn't need VBA.

Frosty
05-09-2012, 04:07 PM
If you need to do some analytics (i.e., it's not always 3 paragraphs, but the range you want to delete is definitely terminated by a paragraph which is auto-numbered), then you'd need (I think) VBA, but you can do a *lot* with wildcard searching.
http://www.word.mvps.org/FAQs/General/UsingWildcards.htm

Tinbendr
05-09-2012, 04:17 PM
x

Zack Barresse
05-09-2012, 05:04 PM
Hi, thanks Frosty. Unfortunately I haven't been able to get the wildcard search to work. I tried that initially, but I can't get it to work. I'm not sure if wildcards should be checked or not, I'm not that proficient with Word to know. What I can get to work is to use "Answer: [A-D]?Rationale:?" as the search criteria, but I need the following paragraph as well. I didn't see how to get an entire paragraph using wildcards.

Tinbendr
05-09-2012, 07:02 PM
Try this.
Sub Test()

With ActiveDocument.Range
.Find.Execute findText:="(Answer*)([0-9]{3,}.)", replacewith:="\2", MatchWildcards:=True, Forward:=True, _
Wrap:=wdFindStop, Format:=False, Replace:=wdReplaceAll
End With
End Sub

Zack Barresse
05-10-2012, 10:53 AM
Doesn't seem to do anything. Still trying to follow the find/replace characters.

Zack Barresse
05-10-2012, 10:58 AM
Here is a sample of the file. Two questions.

Frosty
05-10-2012, 11:38 AM
Zack,

Try this code... this is just from a recorded macro... but I think you probably know how to clean it up.

Sub Macro14()
'
' Macro14 Macro
'
'
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.text = "Answer:*^13*^13*^13*^13"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

I ran this on you example.docx, and it removed what I think you want to have removed.

However, it's a "dumb" macro... it's entirely dependent on you having exactly the same structure (i.e., the code "*^13" in a wildcard search will find an entire paragraph, no matter what it is.

Does this help point you in the right direction?

I'm always nervous about deleting stuff, so I might add in some validation to make sure I'm not deleting "good" data... which could happen if you don't have a requisite blank paragraph as the third paragraph... i.e., you have
Answer: D (this is paragraph 1, and is found with "Answer:*^13")
Rationale: (this is paragraph2, and is found with "*^13")
D is the right answer because yada ayda (this is paragraph 3, is also found with "*^13"
And then Paragraph 4 is simply an empty paragraph (also found with "*^13")

The wildcard link I gave you earlier explains the various codes, but the asterisk is "any number of characters" and "^13" (in a wildcard search) represents the paragraph mark (make sure you show paragraph marks to understand this).

So the only terminating character you have is the paragraph marks...

There are several other ways to approach this... but I'll wait for more questions from you to give you those different approaches...

Frosty
05-10-2012, 11:49 AM
And as an fyi... Tinbendr's code did not work for me either... that is a more complicated wildcard search, which the link I posted would explain how it works (or, in this case, doesn't work for you). But I don't think you need to wade through those options. However, I believe he's simply trying to do a more sophisticated wildcard search, where instead of repeating "*^13" 3 times for 3 paragraphs, he's using the "number of occurrences" strategy, to replace "*^13*^13*^13" with an alternate form, which would be correctly expressed by "[*^13{3}]"

I believe one of the Pauls (Hossler or Macropod) are gurus at really sophisticated wildcard searching, so they may have a way of making the wildcard search itself smart enough to only delete that last paragraph if it is blank, and leave it alone if you don't happen to have the blank paragraph... but I'm not that good. I'd probably just code it by expanding the found range of a simpler search... but you may not need that sophistication.

Tinbendr
05-10-2012, 12:09 PM
I should have know that the numbers were list paragraphs. :doh: My example looked for the next 1-3 digit number and a period.

I just took your example and duplicated it 5x and manually numbered them. It worked fine on that document.

But, Frosty has your answer.

Frosty
05-10-2012, 12:27 PM
Ahh, that would have been pretty neat. That's why you had the /2 in there, to preserve the 2nd argument of found text.

Zach: tinbendr's approach is conceptually more solid than mine, because his instinct was to have an affirmative terminator of the range you're about to delete (a paragraph starting with a number).

Don't mean to belabour the point, I'm sure you're even more interested in not deleting "good" data than we are ;)

Zack Barresse
05-10-2012, 02:22 PM
Okay, so the code worked great on the first file. I added some code to save it as a new name in the format I want. Awesome. Success!

And then, the next file I opened was formatted differently. I opened a few of them and they were the same, but apparenty some are different. Ugh. Sorry about this wrench, didn't see it coming. Instead of paragraph returns, it's a funky character, what I'm assuming is a line feed: it's an arrow at a 90 degree turn pointing to the left. I've seen it a few times before in copy/pasting old data, I'm assuming it's another ASCII character formatted by Word.

Another thing I saw was that it isn't always a numbered list, sometimes the data is just typed in. Looks like whoever put these together just copy/pasted, probably from the web I'm assuming. This isn't so bad because the code still works. It's just those funky returns that don't.

I can certainly live with this. If anyone knows what characters those are I might try adding it, but if I only have to do this manually for a few files it's not so bad.

Thanks everyone, for your patience and expertise. :)

Frosty
05-10-2012, 02:31 PM
You're right, it's a line feed.

Those are called "soft-returns" and are a way of forcing a line break, but not creating a new paragraph... it's a vblf (chr(11)) instead of a vbcr (or chr(13)). You can type it by typing SHIFT+ENTER.

You could, potentially, do a find/replace all on "^l" with "^p" (with wildcard search textbox *not* selected), and that would separate out the linefeeds into the paragraphs you expect with the wildcard replaceall search.

Apart from that... it could get really complicated. How I would approach it would really depend on whether this is a one-off clean up (in which case you might think about turning track-changes on before you run your replace all code... then use the Reviewing tab to accept all changes if the document looks good), or if you need to do some coding to do something analagous to "Delete everything between 'Article:' and some form of a number, whether that is hard text or a formatted paragraph with autonumbering"

BoatwrenchV8
05-10-2012, 04:03 PM
Are these answer keys that are being cleaned up and turned into tests going to be printed out and handed to students or sent to them electronically? If they are to be printed, you could use the search and replace solutions as dicussed above to change the style of the found text to a new custom style. If the new custom style had the font formatted as hidden, it would only display on the screen as well as for printing, if the "All Characters" toggle was turned on. That way, you would still have your answer key and your test. One document to keep track of instead of 2. Just an idea.

Frosty
05-10-2012, 05:04 PM
And dovetailing off of Boatwrench's excellent suggestion, you might investigate setting up a template with a couple of styles that would let you efficiently do this in the future.

And since (I think) you're an excel guy... you could easily start in Excel, work up a structure, and then use us to spit that structure into two formats: a) test questions and b) questions and answer key (if transmitting electronically).

There's probably even a way to do that with some kind of mailmerge.

Zack Barresse
05-10-2012, 05:37 PM
It is for printing, yes. I really like that idea of having hidden styles. I'm not familiar with it, yes I'm an Excel guy. :) These are more of a one-up documents, but I could potentially use the code in the future if I receive files in the same format.

How do I make a style have hidden text?

BoatwrenchV8
05-10-2012, 07:11 PM
This is not the only way to do this and I am sure it is not the best way but it will work.

Once you create the style and to test it out: Display all characters by clicking the "All Characters" toggle button that looks like a paragraph mark in the paragraph section of the home ribbon, select some answer text and apply your new style to the selected text. Toggle all characters off and the answer text should disappear.