Excel Hints

Results 1 to 15 of 15

Thread: Split Word Document

  1. #1

    Split Word Document

    Just found this forum and thought I would see if I could get a little assistance.
    I will try to summarize what I am doing.

    Lets say I have a word document that is 3 pages long.

    Pages 1 and 2 belong together and page 3 should be just a single page.
    So via VBA I want to split this document into two documents. Splitting them on different invoice numbers found on the page. one invoice number per page.

    Here is how i decide when i want to split, on the page there is a word "INVOICE" then to the right is the invoice number, if page two has same invoice number then I want to append that page to the new document as well. If the invoice number is only on one page then just create a one page document and give document name of the invoice number.

    I have some VB/VBA worked up that helps me find my keyword "INVOICE"

    Just really looking for a little direction on how i would split the document. And if i go to the second page how i append that to the previous new document.

    HOpefully that is clear as mud

    Using Office 2003 and will integrate it into vb .net.

    Thanks
    VB:
    Dim objWord As New Word.Application 
    Dim objDoc As Word.Document 
    objDoc = objWord.Documents.Open(TextBox1.Text) 
     
    With objWord.Selection.Find 
        .Text = "INVOICE" 
        .Replacement.Text = "" 
        .Forward = True 
        .Wrap = wdFindContinue 
        .Format = False 
        .MatchCase = False 
        .MatchWholeWord = False 
        .MatchWildcards = False 
        .MatchSoundsLike = False 
        .MatchAllWordForms = False 
    End With 
    Do While objWord.Selection.Find.Execute 
         
         
        objWord.Selection.MoveRight(Unit:=wdCharacter, Count:=2) 
        objWord.Selection.MoveRight(Unit:=wdWord, Count:=4, Extend:=wdExtend) 
        objWord.Selection.Copy() 
         
         
         
        MsgBox(objWord.Selection.Information(wdActiveEndPageNumber)) 
        objWord.Selection.GoToNext(wdGoToPage) 
         
    Loop 
    End Sub 
    
    
    Formatting tags added by mark007

  2. #2
    Right off the bat, use Range.

    However, back up a bit. Please write out your logic requirements more carefully. One one hand:

    "Pages 1 and 2 belong together"

    I am not clear on what "belong together" actually means. It seems that sometimes then may NOT belong together.

    "Here is how i decide when i want to split, on the page there is a word "INVOICE"

    Not sure what "the" page means. Page 1? Page 2? Page 3?

    ONLY Page 1?

    You mention finding INVOICE again on Page 2 , and possibly "appending" it to the "previous" new document. Hmmm.

    This should not be too difficult. Can you post a sample document? Do a few more posts here, as you need to have five posts logged, before you can attach a file.

  3. #3
    Thanks for response. Let me try and be a little more precise on my requirements. Quit smoking a couple weeks ago and my brain just isnt fluent lately

    1. I will start out with a large word document with probably around 100 pages. Basically this document is a compilation of customer invoices. Some of them may span one page while others maybe have more than one page.

    Goal. I need to split this large word document into smaller documents. Seperating out the individual invoices. Each document will have the name of the invoice as the document name once they are broke out.

    Ok, i will start another response with more info.
    Quote Originally Posted by fumei
    Right off the bat, use Range.

    However, back up a bit. Please write out your logic requirements more carefully. One one hand:

    "Pages 1 and 2 belong together"

    I am not clear on what "belong together" actually means. It seems that sometimes then may NOT belong together.

    "Here is how i decide when i want to split, on the page there is a word "INVOICE"

    Not sure what "the" page means. Page 1? Page 2? Page 3?

    ONLY Page 1?

    You mention finding INVOICE again on Page 2 , and possibly "appending" it to the "previous" new document. Hmmm.

    This should not be too difficult. Can you post a sample document? Do a few more posts here, as you need to have five posts logged, before you can attach a file.

  4. #4
    How to know if an invoice can contain more than one page.
    On each page there is a unique Invoice number. My goal was to use this invoice number to be my trigger of when to split into a new word document.

    For example if page 1 has invoice number 12345678
    I then go to the next page and if it has invoice number 12345678
    I then go to the next page and it has invoice number 3223434, then I know I have started a new invoice and pages 1 and 2 should be merged together as a two page document with name 12345678.doc

    I will find my Invoice number by searching for the word INVOICE on each page and then to the right I can grab the invoice number.

    Hopefully this makes a little more sense.

    Basically I want to split a large document into multiple documents, but I have to handle that some documents may split into one page and others may be two or three depending on INVOICE number.

    Its late now,,, i can try and write up a sample that doesnt have sensitive information tomorrow.

    Thanks

  5. #5
    Well,,, back to work! I will see if i can find a sample file here in a few!

    I like weekends better

  6. #6
    Ok here is sample document. This one is only 5 pages, but it could be easily 100 pages.

    To split this document i could go a few ways.

    1. I could key off "CONTINUED ON", which will appear on bottom of page if there is a second page.

    2. INVOICE NUMBER to the right of that invoice number is located and will change.

    3. VIN Number. Vin number is unique per invoice.

    K,, like said just looking for a little guidance on best way to split this document off of the information above.

  7. #7
    Ok here is document.

    Really appreciate anyones help!

  8. #8
    Well I am lot closer!

    I am now creating seperate documents and each document is being titled name of the invoice number.

    But if there is more than one page I am probably overwriting the other page.

    So now I just need to append, instead of save.

    Not the cleanest code and possible not the most efficient but here it is.

    VB:
    Private Sub ParseWordDoc(ByVal Filename As String, ByVal NewFileName As String) 
        Dim WordApp As New Word.Application 
        Dim BaseDoc As Word.Document 
        Dim DestDoc As Word.Document 
        Dim intNumberOfPages As Integer 
        Dim intNumberOfChars As String 
        Dim intPage As Integer 
         
         
         'Word Constants
        Const wdGoToPage = 1 
        Const wdStory = 6 
        Const wdExtend = 1 
        Const wdCharacter = 1 
         
         'Show WordApp
        WordApp.ShowMe() 
         
         'Load Base Document
        BaseDoc = WordApp.Documents.Open(Filename) 
        BaseDoc.Repaginate() 
         
         'Loop through pages
         
        intNumberOfPages = BaseDoc.BuiltInDocumentProperties("Number of Pages").value 
        intNumberOfChars = BaseDoc.BuiltInDocumentProperties("Number of Characters").value 
         
        For intPage = 1 To intNumberOfPages 
            If intPage = intNumberOfPages Then 
                WordApp.Selection.EndKey(wdStory) 
            Else 
                WordApp.Selection.GoTo(wdGoToPage, 2) 
                Application.DoEvents() 
                 
                WordApp.Selection.MoveLeft(Unit:=wdCharacter, Count:=1) 
            End If 
             
            Application.DoEvents() 
             
            WordApp.Selection.HomeKey(wdStory, wdExtend) 
            Application.DoEvents() 
             
            WordApp.Selection.Copy() 
            Application.DoEvents() 
             
             'Create New Document
            DestDoc = WordApp.Documents.Add 
            DestDoc.Activate() 
            WordApp.Selection.Paste() 
             ' new
            With WordApp.Selection.Find 
                .Text = "VEHICLE INVOICE" 
                .Replacement.Text = "" 
                .Forward = True 
                .Wrap = wdFindContinue 
                .Format = False 
                .MatchCase = False 
                .MatchWholeWord = False 
                .MatchWildcards = False 
                .MatchSoundsLike = False 
                .MatchAllWordForms = False 
            End With 
            Do While WordApp.Selection.Find.Execute 
                WordApp.Selection.MoveRight(Unit:=wdCharacter, Count:=2) 
                WordApp.Selection.MoveRight(Unit:=wdWord, Count:=1, Extend:=wdExtend) 
                newname = WordApp.Selection.Text 
            Loop 
             
            DestDoc.SaveAs(NewFileName & "\" & newname & ".doc") 
            DestDoc.Close() 
            DestDoc = Nothing 
             
            WordApp.Selection.GoTo(wdGoToPage, 2) 
            Application.DoEvents() 
             
            WordApp.Selection.HomeKey(wdStory, wdExtend) 
            Application.DoEvents() 
             
            WordApp.Selection.Delete() 
            Application.DoEvents() 
        Next 
         
        BaseDoc.Close(False) 
        BaseDoc = Nothing 
         
        WordApp.Quit() 
        WordApp = Nothing 
    End Sub 
    
    
    Formatting tags added by mark007

  9. #9
    Ai caramba!!!!! That is one messy chunk of data/text.

    OK, you have this text:

    2008 EXPRESS 3500 159 IN WB CUTAWAY TEST MOTORS CORPORATION
    50U SUMMIT WHITE /V8G & SUBSIDIARIES
    93G MEDIUM PEWTER RENAISSANCE CENTER
    ORDER NO. MMMJ2T/TSC STOCK NO. DETROIT MI 48243-1114
    VIN 1GB JG31 K5 81171080 VEHICLE INVOICE 1AD18146653


    Of this, you want everything associated with 1AD18146653, correct? But what about that stuff before it?

    What is going on with all the DoEvents?
    What are you doing with the number of characters (intNunberOfChars)? You never use it.

    But most importantly, what about that stuff before VEHICLE INVOICE?

  10. #10
    Quote Originally Posted by fumei
    Ai caramba!!!!! That is one messy chunk of data/text.

    OK, you have this text:

    2008 EXPRESS 3500 159 IN WB CUTAWAY TEST MOTORS CORPORATION
    50U SUMMIT WHITE /V8G & SUBSIDIARIES
    93G MEDIUM PEWTER RENAISSANCE CENTER
    ORDER NO. MMMJ2T/TSC STOCK NO. DETROIT MI 48243-1114
    VIN 1GB JG31 K5 81171080 VEHICLE INVOICE 1AD18146653


    Of this, you want everything associated with 1AD18146653, correct? But what about that stuff before it?

    What is going on with all the DoEvents?
    What are you doing with the number of characters (intNunberOfChars)? You never use it.

    But most importantly, what about that stuff before VEHICLE INVOICE?
    Your right intnumberofchars is junk and should of been cleaned up.... now removed in future.

    As for some of this code I have copied from examples I found online (for example the doevents) and it worked for what I was needing.

    As for the stuff before VEHICLE INVOICE it is being copied.

    Basically code seems to copy everything on page first.

    VB:
    WordApp.Selection.GoTo(wdGoToPage, 2) 
    Application.DoEvents() 
     
    WordApp.Selection.MoveLeft(Unit:=wdCharacter, Count:=1) 
    End If 
     
    Application.DoEvents() 
     
    WordApp.Selection.HomeKey(wdStory, wdExtend) 
    Application.DoEvents() 
     
    WordApp.Selection.Copy() 
    Application.DoEvents() 
     
     'Create New Document
    DestDoc = WordApp.Documents.Add 
    DestDoc.Activate() 
    WordApp.Selection.Paste() 
    
    
    Formatting tags added by mark007
    Then pastes the contents into a new document.
    Before I save it, I found the invoice number.

    My plan was after i save it, then I go to the next page and copy those contents. If the INVOICE NUMBER is same, i would insert a page break and then copy page 2.

  11. #11
    Quote Originally Posted by fumei
    Ai caramba!!!!! That is one messy chunk of data/text.

    OK, you have this text:

    2008 EXPRESS 3500 159 IN WB CUTAWAY TEST MOTORS CORPORATION
    50U SUMMIT WHITE /V8G & SUBSIDIARIES
    93G MEDIUM PEWTER RENAISSANCE CENTER
    ORDER NO. MMMJ2T/TSC STOCK NO. DETROIT MI 48243-1114
    VIN 1GB JG31 K5 81171080 VEHICLE INVOICE 1AD18146653


    Of this, you want everything associated with 1AD18146653, correct? But what about that stuff before it?
    I want everything on that page.... to copy the whole page.
    If the next page has same invoice number then copy that whole page and append to the first.

    I probably am not being clear enough makes sense inside my head though

  12. #12
    Oh dear, here is that "page" thing again. Sigh.

    I have to go and do some real (paid) work. It can be done, but...shudder....

    Are you SURE, sure sure sure sure that every single "page" is there because of an actually page break? This is critical. NOWHERE does text extend enough to make another page without an explicit PageBreak.

    You must understand that "whole page", in one sense, is a totally meaningless term with Word. If each "page" is NOT created by a real, explicit, page break...then what you see as a "page" can be very very different from what I (or anyone else) will "see" as a "page".

    With the same document, what my Word will "say" is a "whole page" can, and often IS, different from what your Word says is a "whole page".

  13. #13
    Ummm yes there are explicit page breaks.

    My term of "page" may be off if someone was to change the font or something.

    But with the default font of Courier New font size 8 then a page is a page.

    Sorry to bug you,,, just was banging my head. I will figure it out in time.
    Quote Originally Posted by fumei
    Oh dear, here is that "page" thing again. Sigh.

    I have to go and do some real (paid) work. It can be done, but...shudder....

    Are you SURE, sure sure sure sure that every single "page" is there because of an actually page break? This is critical. NOWHERE does text extend enough to make another page without an explicit PageBreak.

    You must understand that "whole page", in one sense, is a totally meaningless term with Word. If each "page" is NOT created by a real, explicit, page break...then what you see as a "page" can be very very different from what I (or anyone else) will "see" as a "page".

    With the same document, what my Word will "say" is a "whole page" can, and often IS, different from what your Word says is a "whole page".

  14. #14
    "My term of "page" may be off if someone was to change the font or something."

    "Has to"?

    Do you, or do you not, need to think about that? If you do, then that must be considered.

    Would anyone change format before the split? Once split into individual docs, then who cares? But if it could be changed BEFORE you do the split, then you care.

  15. #15
    Melvin74,

    Did you ever figure this out? I need to do the samething and I receive compilation errors when running your code.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •