PDA

View Full Version : Split Word Document depending on the reference mentioned on the page.



ashleyuk1984
09-24-2014, 01:53 AM
Hi,
I'm more of a Excel VBA person, but Word is a completely different ball game for me.
I hope you can help.

I run reports from our system. I usually have to run each job file one by one and save that document. Then start again with the next file. This can be very time consuming. I do this for every file processed in that particular month. Sometimes we have 300+ files. Can take a few hours to complete.

Our system has the ability to generate reports in 'bulk' (for a whole month for instance)... The problem with this, is that it generates one big docx file. - It doesn't split the jobs into different files.

So I was thinking that maybe a VBA solution might be good for this.

This is what I would like the code to do.
I want to split the big docx file into multiple files, based on a condition (The reference of the file).

Lets say for instance my docx file is 20 pages.

Pages one and two contains the same "reference". I want those two pages to be saved to a separate file.

Page three and four contain DIFFERENT references, so therefore I only want page three to be saved to a separate file.

Page four, five and six, contain the SAME reference, so therefore I want these pages saved to a separate file.

I hope you understand the requirement.

I have made an image to help illustrate what I would like.
The letters drawn onto the image represent the reference as an "example". My real references are usually like "EKK3434214".

Is this task possible?

Please make the code easy to read as I may need to modify it slightly to suit my needs 100%.

Thank You.

http://ultraimg.com/images/Capturef5eaf.png

gmaxey
09-24-2014, 05:36 AM
Not exactly what you are looking for (because you have not defined a common delimiter) but this may help:
http://gregmaxey.mvps.org/word_tip_pages/document_splitter.html

gmayor
09-24-2014, 06:39 AM
The difficulty in producing a macro to do this is exacerbated by the fact that you haven't indicated where EXACTLY on the page the reference number is located and what form it takes e.g. a form field, a content control, plain text etc. It would also help to understand how the pages are separated: By section page break, manual page break, or text flow.

You have quoted a reference number - EKK3434214 - which is the sort of reference number that changes throughout your document. So HOW does it change? Is it always three letters (the same three?) followed by 8 digits. If the macro cannot find it, the document is not going to be spit automatically.

There are no 'pages' in a word document. There is only text flowed into virtual 'pages'. It is a variable feast, so a macro has to know where the 'pages' start and stop in order to set ranges that can be saved as documents.

ashleyuk1984
09-24-2014, 07:24 AM
I have placed a common delimiter onto my reports. The word "Finished" appears at the end of each completed report.
Thank you for the freeware program. It works but has quite a few bugs.
It places headers and footers, and also generates blank pages for some reason? Not quite sure why. I don't know a lot about word vba so I'm unable to debug it confidently. It also doesn't save the files with the reference number as the filename (this would be extra handy!).

I have found a piece of code and modified it in order to obtain the "Reference Number", so that this can be used after splitting the file.


Sub GetReference() Dim x As Integer
FrameCount = ActiveDocument.Frames.Count
For x = 1 To FrameCount
Reference = ActiveDocument.Frames(x).Range.Text
If Left(Reference, 4) = "EKK3" Then
Debug.Print Reference 'To be changed to the Save As command
End If
Next x
End Sub

Now that I have supplied a common delimiter "Finished", is it possible to program something ?
I would also like to have the above code included within the loop so that it saves the split files.
Thank you for your help on this.



The difficulty in producing a macro to do this is exacerbated by the fact that you haven't indicated where EXACTLY on the page the reference number is located and what form it takes e.g. a form field, a content control, plain text etc. It would also help to understand how the pages are separated: By section page break, manual page break, or text flow.

You have quoted a reference number - EKK3434214 - which is the sort of reference number that changes throughout your document. So HOW does it change? Is it always three letters (the same three?) followed by 8 digits. If the macro cannot find it, the document is not going to be spit automatically.

There are no 'pages' in a word document. There is only text flowed into virtual 'pages'. It is a variable feast, so a macro has to know where the 'pages' start and stop in order to set ranges that can be saved as documents.

Thank you for replying gmayor, I will try to help you as best as I can.
With the above code that I have supplied, it shouldn't matter where the reference number on the page is, as the loop will find it.
I'm not entirely sure which type of separation the pages have taken, but I presume it's section page break? I'm not sure though... Hopefully with the use of the delimiter that I have placed at the end of each complete report, this should help.
It's always the same three letters yes. It increments by 1 each time, however, if a job is previously cancelled, then the report for the cancelled job wont be created, so therefore the increment would be 2 in that scenario. Again, with the help of the code above, hopefully that problem will be solved.

Would it help if I created another delimiter at the START of the report??
Something like "Start" .... report .... "Finished". Would this help?
Thank you for taking the time to reply

gmayor
09-24-2014, 07:57 AM
Can you attach a sample of the document, so we can see what we are working with? By all means remove any personal or sensitive data from the document.
It is all very well being able to find the reference, but it still requires that to be set in context with reference to the 'pages' you want to save. From the code you have posted it seems there are frames involved. That might be helpful, if, for example, you wanted the 'page' to be from the frame with the reference number to the frame before the frame where the reference number changes.

ashleyuk1984
09-25-2014, 04:08 AM
Hi,
I've decided to take a slightly different approach to the problem, and came up with a solution.

However, the macro takes a very long time to run. Well over an hour.

To cut the story short, I have decided to search for the reference number using the above technique, and then store the page numbers.
Once I have the number of pages, and the reference, I EXPORT the pages to a PDF file.

My original bulk file that I want to split, has about 500 references... So that means, 500 PDF's need to be generated.
To start off with, the macro runs very fast, the first 40 or so references generate in a few minutes.... But as it reaches close to 100, it's taking over a minute to generate just one PDF file. The macro has grinded to a halt as such.

Any ideas what I could do to help the macro maintain a constant speed?

This is the code that I have so far.
It's not the best, but it's the best I could do with my limited knowledge.


Sub GetReference()
Dim x As Long
FrameCount = ActiveDocument.Frames.Count

'Get first reference, and first page number
For x = 1 To FrameCount
If Left(ActiveDocument.Frames(x).Range.Text, 3) = "BOU" Or Left(ActiveDocument.Frames(x).Range.Text, 3) = "YEO" Then
SILReference = ActiveDocument.Frames(x).Range.Text
PageNumber = ActiveDocument.Frames(x).Range.Information(wdActiveEndPageNumber)
StoreFrame = x
Exit For
End If
Next x

For x = StoreFrame To FrameCount
If Left(ActiveDocument.Frames(x).Range.Text, 3) = "BOU" Or Left(ActiveDocument.Frames(x).Range.Text, 3) = "YEO" Then
If ActiveDocument.Frames(x).Range.Information(wdActiveEndPageNumber) <> PageNumber Then
If ActiveDocument.Frames(x).Range.Text <> SILReference Then

ActiveDocument.ExportAsFixedFormat OutputFileName:= _
"C:\Users\Tayloras\Desktop\SIL REFS\" & SILReference, ExportFormat:=wdExportFormatPDF, _
OpenAfterExport:=False, OptimizeFor:=wdExportOptimizeForPrint, Range:= _
wdExportFromTo, From:=PageNumber, To:=ActiveDocument.Frames(x).Range.Information(wdActiveEndPageNumber) - 1, Item:=wdExportDocumentContent, _
IncludeDocProps:=True, KeepIRM:=True, CreateBookmarks:= _
wdExportCreateNoBookmarks, DocStructureTags:=True, BitmapMissingFonts:= _
True, UseISO19005_1:=False

SILReference = ActiveDocument.Frames(x).Range.Text
PageNumber = ActiveDocument.Frames(x).Range.Information(wdActiveEndPageNumber)

End If
End If

End If
Next x

End Sub



.......

I've interrupted the macro, and I'm now stepping through it.
It's taking about a whole second to step through each of these lines... At the beginning of the document, this wasn't a problem. So I'm guessing that it's trying to search for it? ... and the further into the document it gets, the slower the macro will run?
A second isn't a lot of time... but collectively it's a massive amount of time.


If Left(ActiveDocument.Frames(x).Range.Text, 3) = "BOU" Or Left(ActiveDocument.Frames(x).Range.Text, 3) = "YEO" Then
If ActiveDocument.Frames(x).Range.Information(wdActiveEndPageNumber) <> PageNumber Then
If ActiveDocument.Frames(x).Range.Text <> SILReference Then

Is there anything that I could disable? something similar like "screenupdating = false"

Can you think of anything that I could try to fix the delay?
Would... moving the cursor to the page help? I'm not sure what that command would be though? - In Excel terms, it would be Range("A1").Select or Sheets(2).Select

Thanks

.......

Ok, so I've just tested my theory of the macro increasingly getting slower the deeper into the main file it gets.
It is definitely gradually slowing down.

I placed this into the code:

Application.StatusBar = "Processing frame " & x & " out of " & FrameCount

The x variable (current frame), shoots up from 0 to 1000 in no time at all, and then it starts to get slower, and slower, and slower.
I really hope there is a solution to this - hopefully by just disabling something :(

snb
09-25-2014, 05:25 AM
I run reports from our system. I usually have to run each job file one by one and save that document. Then start again with the next file.

Post an example of one of those files (is it txt, xlsx, docx ?).
Instead of splitting a Word document it's easier to modify separate files (not even mentioning PDF's)