PDA

View Full Version : copy contents from one doc to another...



emsa
12-11-2009, 08:26 AM
Hello everybody,

I would greatly appreciate any advice or better still some code that would extract certain headings and paragraphs from a document and place them in a new document.

After numerous attempts at recording macros (my current skill level regarding vba) and trying to stitch them together, I’ve just about given up. I’m not even sure if this task can be achieved. :dunno

To be as short and precise as possible, I would like to achieve the following goal:

Search the document for paragraphs/sentences, which end with (shortcoming)., (observation). or (finding)., select and copy them to a new file along with their associated headings.

Example:

Heading 6.1 Criterion A.1 (copy to new document)

Sentence XXXXXXXXXXXXXXXXXXXXXXXXXXX. skip

Paragraph XXXXXXXXXXXXXXXXXXXXXXXXXXX. ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. skip

Paragraph XXXXXXXXXXXXXXXXXXXXXXXXXXX. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. ZZZZZZZZZZZZZZZZZZZZZZZZZ (shortcoming). (Copy to new document)

Paragraph XXXXXXXXXXXXXXXXXXXXXXXXXXX. XXXXXXXXXXXXXXXXXXXXXXXXXX. ZZZZZZZZZZZZZZZZZZZZZZZ (observation). (Copy to new document)

Sentence XXXXXXXXXXXXXXXXXXXXXXXXXXX (finding). (Copy to new document)

Heading 6.1 Criterion A.2 (copy to new document)

XXXXXXXXXXXXXXXXXXXXXXXXXXX. skip

XXXXXXXXXXXXXXXXXXXXXXXXXXX. skip

XXXXXXXXXXXXXXXXXXXXXXXXXXX (shortcoming). (Copy to new document)

XXXXXXXXXXXXXXXXXXXXXXXXXXX (observation). (Copy to new document)

XXXXXXXXXXXXXXXXXXXXXXXXXXX (finding). (Copy to new document)

Not all headings in the document contain text with findings, observations or shortcomings. I can’t think how it is possible to link the headings and their associated shortcomings, observations and findings.

Any assistance would be a real relief to me. :help

Kind regards,

EMSA

Dave
12-11-2009, 09:58 AM
Seems like this might help. Dave
http://www.vbaexpress.com/kb/getarticle.php?kb_id=861
edit: Perhaps not? Paragraphs are 1 line only for this code to work.

fumei
12-11-2009, 10:44 AM
1. Define "certain", as in "would extract certain headings".

You have:

Heading 6.1 Criterion A.1 (copy to new document)

Is this using a explicit style (say Heading 2)? Is it possible that Heading 6.2 (assuming there is one) would NOT be copied even if it contains (shortcoming), (observation)?

2. Once the heading is identified, yes it is possible to extract (copy) out paragraphs to a new document. However, if you want to extract them out in the order they are in the original, this is much more complicated.

You will need to search the string of each paragraph to see if it contains one of your search strings - (shortcoming), (observation), (finding).

Say you search for (shortcoming). The easy way would be:

a) search for "(shortcoming)"
b) extract the paragraph that contains it.
c) continue the search for "(shortcoming)"
d) extract the paragraph that contains it.
e) proceed through the document doing a) to d)
f) search for "(observation)"
g) repeat a) to d) using "(observation)"

However, the result of course would be:

Paragraph XXXXXXXXXXXXXXXXXXXXXXXXXXX. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX. ZZZZZZZZZZZZZZZZZZZZZZZZZ (shortcoming). (Copy to new document)

XXXXXXXXXXXXXXXXXXXXXXXXXXX (shortcoming). (Copy to new document)

Paragraph XXXXXXXXXXXXXXXXXXXXXXXXXXX. XXXXXXXXXXXXXXXXXXXXXXXXXX. ZZZZZZZZZZZZZZZZZZZZZZZ (observation). (Copy to new document)

XXXXXXXXXXXXXXXXXXXXXXXXXXX (observation). (Copy to new document)


Probably not what you want.

IF (shortcoming), (observation), (finding)are ALWAYS together like that - three paragraphs sequentially - then it becomes easier. The logic then becomes:

a) search for "(shortcoming)"
b) extract the paragraph that contains it, and the next two
c) continue the search for "(shortcoming)"
d) extract the paragraph that contains it, and the next two.

etc.

emsa
12-11-2009, 11:53 AM
Hi Fumei,

Firstly, thanks for your prompt reply. In response to your questions:

· Yes they use an explicit style. The headings use built-in “Heading 2” style.

· If any of the headings contain paragraphs that are not observations, findings or shortcomings, then they can be skipped. However, it would be very nice if the heading was still copied across and a standard text inserted “no observations, shortcomings or findings noted for this criterion”.

· Unfortunately, the headings must link to the associated observations, findings or shortcomings. So yes they must be extracted in the order they are in the original. I thought this would be complicated. However, they are in order--shortcoming, observation and finding.

Regards,

EMSA

fumei
12-11-2009, 01:29 PM
Take a look at the attached file. Click "Extract Stuff" on the top toolbar. It uses:Option Explicit

Sub ExtractStuff()
Dim r As Range
Dim r2 As Range
Dim ThisDoc As Document
Dim CopyToDoc As Document
Set ThisDoc = ActiveDocument
Set r = ThisDoc.Range
Set CopyToDoc = Documents.Add

With r.Find
.ClearFormatting
.Style = "MyHeading"
.Text = ""
Do While .Execute = True
' r.Text is the heading text
' copy to CopyTo doc
CopyToDoc.Range.InsertAfter r.Text

' set second range to end of r to
' end of document and find shortcoming
Set r2 = ThisDoc.Range( _
Start:=r.End, End:=ThisDoc.Range.End)
With r2.Find
.ClearFormatting
.Text = "(shortcoming)"
.Execute
End With

' expand to contain paragraph
With r2
.Expand Unit:=wdParagraph
.End = r2.Next(Unit:=wdParagraph, Count:=2) _
.End - 2
End With
With CopyToDoc.Range
.InsertAfter r2.Text
.InsertParagraphAfter
End With
r.Collapse 0
r.Start = r2.End
Set r2 = Nothing
Loop
End With
End Sub


A few points.

1. Notice that the extracted contents are NOT formatted the same.

2. There are NO "empty" paragraphs to make space between paragraphs. Hopefully you are using Styles fully and do not have any of these. The code - as stated - grabs the paragraph containing (shortcomings) and the next TWO. If one of those is an "empty" paragraph...tough luck.

3.
· If any of the headings contain paragraphs that are not observations, findings or shortcomings, then they can be skipped. However, it would be very nice if the heading was still copied across and a standard text inserted “no observations, shortcomings or findings noted for this criterion”.This example does not have this situation. This is MUCH MUCH more difficult to do. Why? Because it adds a complicated piece of logic. You need to suspend one search to do another.

emsa
12-11-2009, 02:59 PM
Hi Fumei,

The example works a treat thanks. However, I should of mentioned that there could be many shortcomings and many observations or just observations. Although they will always be listed in order, it could be that only 6 findings are listed for a particular heading. Sorry for not describing my problem fully to you. I've attached the format of the document with your macro showing an example of the text. I hope you will see what I mean.

TonyJollans
12-12-2009, 06:12 AM
It seems to me that wanting headings as well as selected paragraphs makes it a difficult job to do with Find. The sample document looks well formed (as it would be called in xml), that is to say properly organised, numbered, styled, etc., so it would be possible to run through the paragraphs looking for Body Text and extracting what you want, including headings. However, numbered headings will be renumbered as part of any extraction process (including using Find) unless special steps are taken. So, would you want the numbers retaining?

emsa
12-12-2009, 12:02 PM
Hi Tony,

I suppose it would help if I gave a little background information on the subject. The document in question is in fact an inspection report based on a workgroup template. A selection of both custom and built-in styles are used to format the text. There is very little direct formatting applied except for underlining and italicing.

As you can see the inspection results are documented under one partciular section, section 6. After the results, any shortcomings, observations or findings are entered specific to the criterion, which is in fact identified by the heading.

The reports are between 25 and 75 pages in size and a number of people only require the so called juicy bits, i.e. the shortcomings, observations and findings. So, I thought in my wisdom I'll create a summary report containing only the shortcomings, observations and findings related to each criterion.

After recording many macros and sweeping the internet for bits of code to stitch them all together, I soon realised that I was out of my depth.

Fumei's sample code really cheered me up yesterday, but I failed to explain the task fully, which must really "£$$%% you guys off a lot. There can be any number of shortcomings, observations and findings under a particular criterion heading.

My ideal scenario would be to scan the report and copy the headings and their associated shortcomings, observations and findings along with the formatting. However, I suppose I could run another macro that formatted the new summary report.

Thanks for your post.

TonyJollans
12-12-2009, 02:06 PM
It isn't the formatting, per se, it's the numbering that bothers me.

Suppose you have:

6.1 Heading the first
.1 sub-heading #1
a para of text
6.2 Heading the second
.1 sub-heading #1
a para of text
another para
.2 sub-heading #2
a para ending in (shortcoming)

copied, this could result in something like:

3.1 Heading
.1 sub-heading #2
a para ending in (shortcoming)

This happens because the numbers are references, not text, and with half the text gone, the references (to the underlying sequence) could change. The text of the heading would be correct, but the numbers wrong, so checking back to the original document might be difficult. Would this matter?

emsa
12-13-2009, 03:41 AM
Yes you are right it would be meaningless, in fact copying across the data into a new document (based on normal.dot) would start heading numbering from 1 and subsequent paragraph numbers from 1.1. So, no this will not matter; thinking about it now I'd prefer somehow if it was also possible to bring across the information from the title page of the report to identify the new summary document. This information is located in a text box positioned in the middle of the page.

Kind regards,

emsa

TonyJollans
12-13-2009, 09:48 AM
Thank you.

I think in this case it would probably be best to take a copy of the document and delete the bits you don't want, rather than try to extract the bits you do want.

Here's some basic code to start with:

Sub tjExtractStuff()

Dim Para As Word.Paragraph
Dim ndxPara As Long

Dim posLParen As Long

For ndxPara = ActiveDocument.Paragraphs.Count To 1 Step -1
Set Para = ActiveDocument.Paragraphs(ndxPara)
With Para
If Para.OutlineLevel = wdOutlineLevelBodyText Then
With .Range
posLParen = InStrRev(.Text, "(")
If posLParen > 0 Then
Select Case Mid$(.Text, posLParen)
Case "(shortcoming)" & vbCr, _
"(observation)" & vbCr, _
"(finding)" & vbCr
Case Else
.Delete
End Select
Else
.Delete
End If
End With
Else
If .Next Is Nothing Then
.Range.Delete
ElseIf .Next.OutlineLevel <= .OutlineLevel Then
.Range.Delete
End If
End If
End With
Next ndxPara

End Sub

This runs rather slowly and, ultimately, ends with an error when it reaches the TOC (it works from the end backwards), but, by then, it has done its stuff. Some extra processing to deal with the TOC, and probably the summary tables should speed it up, but I'm not entirely sure what you will have in all circumstances.

macropod
12-13-2009, 04:53 PM
Hi folks,

How about something like:
Sub ExtractStuff()
Dim oPara As Paragraph
Dim i As Integer
Dim strKeep, strTmp As String
strKeep = "(shortcoming),(observation),(finding)"
Application.ScreenUpdating = False
With ActiveDocument
For Each oPara In .Paragraphs
With oPara
If .OutlineLevel = wdOutlineLevelBodyText Then
For i = 1 To UBound(Split(strKeep, ","))
strTmp = Split(strKeep, ",")(i)
If InStrRev(.Range.Text, strTmp) - Len(.Range.Text) + Len(strTmp) = 0 Then
i = 0
Exit For
End If
Next
If i <> 0 Then .Range.Delete
End If
End With
Next
End With
Application.ScreenUpdating = True
End Sub

geekgirlau
12-13-2009, 06:14 PM
Building on macropod's concept a little - this one keeps the headings


Sub ExtractStuff()
Dim oPara As Paragraph
Dim oParaHdg As Paragraph
Dim iHeading As Integer
Dim i As Integer
Dim strKeep, strTmp As String


strKeep = "(shortcoming),(observation),(finding)"
Application.ScreenUpdating = False

With ActiveDocument
For Each oPara In .Paragraphs
With oPara
' check whether the heading is required
If .OutlineLevel = wdOutlineLevel2 Then
' were any paragraphs found for that heading?
If iHeading = 0 Then
oParaHdg.Range.Delete
End If

' set the next heading paragraph
Set oParaHdg = oPara
iHeading = 0

ElseIf .OutlineLevel = wdOutlineLevelBodyText Then
For i = 0 To UBound(Split(strKeep, ","))
strTmp = Split(strKeep, ",")(i)

If InStrRev(.Range.Text, strTmp) - Len(.Range.Text) + Len(strTmp) = 0 Then
' flag to retain the heading
iHeading = iHeading + 1
i = 0
Exit For
End If
Next

If i <> 0 Then
.Range.Delete
End If
End If
End With
Next
End With

Application.ScreenUpdating = True
End Sub

emsa
12-14-2009, 05:32 AM
Many many thanks for responding. I tried running the macro on the actual report, see attachment. It stops at .Range.Delete when I then look back at the report, it has deleted only upto the table contents heading.

Kind regards

TonyJollans
12-14-2009, 08:36 AM
I did say it would error at the TOC. I don't know how best to address the issue until I know a bit more about your requirements.

It's probably easiest to delete the TOC, but what about the early part of the document - the introduction and summary tables - might they be wanted in part?

emsa
12-14-2009, 11:57 AM
Hello Tony,

The TOC won't be necessary, neither will the introduction, summary tables, appendices and acknowledgements. However, the title from the front cover would be ideal.

The scenario for me would be: person selects "run summary report" macro from the toolbar. The macro pulls all the shortcoming, observation and finding paragraphs along with their assoiated headings along with the report title details. This data would be presented via a strip down version of the original report (SaveAs) or a new document. Ideally, It would be nice to include the whole thing in the macro, i.e. SaveAs or NewDocument.

Thanks for your time.

emsa
12-14-2009, 12:06 PM
Hello Tony,

The TOC will not be required, neither will the introduction, summary tables, appendices or acknowledgements. It would be nice to include the title from the front page.

The ideal scenario is to run the macro from a toolbar which creates the new report in one. The report could be presented as a strip down version of the original (SaveAs) or a new document.

Many thanks.

fumei
12-14-2009, 02:39 PM
Try this. Here is the code.
Option Explicit

Sub DeletionRoutine(r As Range)
Dim j As Long
j = r.Words.Count
Select Case r.Words(j - 1)
Case ")"
Select Case r.Words(j - 2)
Case "shortcoming", "observation", "finding"
' do nothing so these are retained
Case Else
r.Delete
End Select
Case Else
' everything else is deleted
r.Delete
End Select
End Sub

Sub ExtractStuff()
Dim oPara As Paragraph
Dim ExtractToPath As String

ExtractToPath = ActiveDocument.Path & "\"
ActiveDocument.Sections(2).Range.Delete
ActiveDocument.Bookmarks("StartingStuff").Range.Delete
ActiveDocument.Bookmarks("EndingStuff").Range.Delete

For Each oPara In ActiveDocument.Paragraphs
Select Case oPara.Style
' Heading 2 is retained as it is NOT tested
Case "Heading 3"
Call DeletionRoutine(oPara.Range)
Case "Body Text 2"
Call DeletionRoutine(oPara.Range)
End Select
Next
ActiveDocument.SaveAs FileName:=ExtractToPath & _
Left(ActiveDocument.Name, Len(ActiveDocument.Name) - 4) & _
"_Summary.doc"
End Sub


NOTES:

I added a Section Break after the ToC, thus it is wholely in Section 2. It is removed by:
ActiveDocument.Sections(2).Range.Delete
The title page is retained, as it is in Section 1.

I added two bookmarks StartingStuff and EndingStuff. These bookmark the chunks at the start and the end - duh - that are to be removed. The ToC, the Basic Facts, Acknowledgement, Appendix, blah blah. These are removed with:
ActiveDocument.Bookmarks("StartingStuff").Range.Delete
ActiveDocument.Bookmarks("EndingStuff").Range.Delete


It was still a little tricky as you have Heading 3 some times WITH text to be retained, some times without.

.1 text

.1 text (observation)

Thus I had to test it, as well as Body Text 2.

Therefore I wrote a testing separate procedure DeletionRoutine which has a range object as a input parameter. After the deletion of the bookmarks and Section 2, each paragraph is tested:
For Each oPara In ActiveDocument.Paragraphs
for its Style.

Only Heading 3 and Body Text 2 are tested. If the paragraph is either one, its range is passed to DeletionRoutine.

There, the first test is to see the second-to-last "word" is the end bracket. The last word is - of course - the paragraph mark.

If it IS ")" then test if the third-to-last word is "shortcoming", "observation", "finding". If that is the case, do nothing. If it is not, then delete the paragraph.

Thus:
whatever) will be deleted.
shortcoming) will not be deleted.

If the second-to-last word is NOT ")", then delete the paragraph.

fumei
12-14-2009, 02:43 PM
As Tony mentioned, the new SavedAs document will be numbered from 1.

Dave
12-15-2009, 12:35 AM
Nice stuff Fumei! Thanks to you and the others thread participants for their code and detailed explanation. I like your use of "chunks" :) It aptly describes a Word situation. The code seems kind of document specific but yet very generic in its use of styles. Dave

emsa
12-15-2009, 09:52 AM
Yipppppeeeeeeee brilliant guys:cloud9: ; thanks a lot Fumei :bow: and to all who contributed. It works like a dream, I even dared add a bit of code and it worked. I forgot a few things like a piece of standard text that follows each Heading 2 that is not required, namely the style “Criterion/Article”. I added the code:

Case "Criterion/Article"
Call DeletionRoutine(oPara.Range)

And it seem to work. I also needed to include two other headings, namely “Ship Heading” and “Heading Underline” these appear in the new document without changing things and the various shortcomings, observations and findings are still displayed for these headings naturally. However, if there are no shortcomings, observations and findings for any “Heading 2”, “Ship Heading” or “Heading Underline” these headings are still copied across into the new document. Would it be possible to somehow not include these headings? I bet it’s easier said than done!

I've attached the new report layout that produces everything perfectly, but includes headings that are not required.

Once again, THANKS for your time.

fumei
12-16-2009, 11:29 AM
Basically, and as I have stated numerous times, the coding parts is not really difficult, it is the logic, the thinking, that is hard.

If you want to include (or exclude) "chunks", then:

1. define the chunks and bookmark them
2. use ONLY properly set up Styles.

I am not quite following your requirements, but it does not matter. I believe if you want to extract chunks (of whatever definitions) into a new document, it is likely that it is possible.

emsa
12-17-2009, 11:47 AM
Hello Fumei

Thanks for getting back to me and sticking with it. Sorry for the confusion and not giving all the details up front. I think the examples provided below will hopefully help the cause.

There are two styles that I forgot to mention, namely “Ship Heading” and “Heading Underline” that can be applied to text when needed as non-numbered sub headings under a Criterion “Heading 2”. Unlike the “Heading 2” style, which is already applied to boilerplate headings in the template, i.e. 6.1 Criterion A1, 6.2 Criterion A2 etc., “Ship Heading” and “Heading Underline” styles may or may not appear under a Criterion “Heading 2” and can have any title, for example, a report could produce the following scenarios (my comments are in brackets):

Example 1:

6.1 Criterion A2 (this heading needs to be extracted because a shortcoming and observation have been documented)

Some text. (not required)
Some text (shortcoming). (extract)
Some text (observation). (extract)

Example 2:

6.1 Criterion A.2 (this heading does not need to be extracted because no shortcomings, observations or findings were documented)

Some text. (not required)
Some text. (not required)
Some text. (not required)

Example 3:

6.1 Criterion A.2 (this heading needs to be extracted because an observation, finding and shortcoming have been documented)

Some text. (not required)
Some text. (not required)

An underline sub-heading (this heading needs to be extracted because the observation and finding relate to it specifically)

Some text. (not required)
Some text (observation). (extract)
Some text (finding). (extract)

Ship Heading (this heading needs to be extracted because the shortcoming relates to it specifically)

Some text. (not required)
Some text. (not required)
Some text (shortcoming). (extract)

Example 4:

6.1 Criterion A.2 (this heading needs to be extracted because a finding has been documented)

Some text. (not required)
Some text. (not required)

An underline sub-heading (this heading does not need to be extracted because no shortcomings, observations or findings relating to it were documented)

Some text. (not required)
Some text. (not required)

Ship Heading (this heading needs to be extracted because the finding relates to it specifically)

Some text. (not required)
Some text (finding). (extract)

Example 5:

6.1 Criterion A.2 (this heading does not need to be extracted because no shortcomings, observations or findings were documented)

Some text. (not required)
Some text. (not required)

An underline sub-heading (this heading does not need to be extracted because no shortcomings, observations or findings were documented)

Some text. (not required)
Some text. (not required)

Ship Heading (this heading does not need to be extracted because no shortcomings, observations or findings relating to it were documented)

Some text. (not required)
Some text. (not required)

Example 6:

6.1 Criterion A.2 (this heading needs to be extracted because a finding has been documented)

Some text (finding). (extract)
Some text. (not required)

An underline sub-heading (this heading does not need to be extracted because no shortcomings, observations or findings were documented)

Some text. (not required)
Some text. (not required)

Ship Heading (this heading does not need to be extracted because no shortcomings, observations or findings were documented)

Some text. (not required)
Some text. (not required)

I hope this has made things clearer and that you are still willing and able to assist. Many thanks again Fumei for your time.

fumei
12-21-2009, 12:22 PM
Like i said, yes it is possible. Just extend the logic to include the further testing. I must add that to do this is a quite complex set of testing. Quite complex. But definitely do-able.

emsa
12-23-2009, 02:51 PM
Hello Fumei,

As your code for extracting data works perfect and the fact that I would never be able to modify it successfully, it was easier to play around with the data once in the new document. So with some further help, the follwing code is used to remove unwanted headings.

For Each oPara In ActiveDocument.Paragraphs
If Not oPara.Next Is Nothing Then
If oPara.Style = "Heading Underline" Then
If oPara.Next.Style = "Heading 2" Then
oPara.Range.Delete
End If
End If
End If
Next oPara
For Each oPara In ActiveDocument.Paragraphs
If Not oPara.Next Is Nothing Then
If oPara.Style = "Heading 2" Then
If oPara.Next.Style = "Heading 2" Then
oPara.Range.Delete
End If
End If
End If
Next oPara
For Each oPara In ActiveDocument.Paragraphs
If Not oPara.Next Is Nothing Then
If oPara.Style = "Heading 2" Then
If oPara.Next.Style = "Heading 1" Then
oPara.Range.Delete
End If
End If
End If
Next oPara

I also forgot that there are tables in some reports. So the following code is used:

For Each aTable In ActiveDocument.Tables
aTable.Delete
Next aTable

Many thanks.