Consulting

Results 1 to 15 of 15

Thread: Splitting large document

  1. #1
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location

    Splitting large document

    Good afternoon,

    First time poster, and pretty much a newbie using VBA. I have been set a challenge to split a large document into lots of smaller documents by specific parameters.

    This is what ive found so far..


    Sub SplitNotes(delim As String, strFilename As String)
    Dim doc As Document
    Dim arrNotes
    Dim I As Long
    Dim X As Long
    Dim Response As Integer
    arrNotes = Split(ActiveDocument.Range, delim)
    Response = MsgBox("This will split the document into " & UBound(arrNotes) + 1 & " sections.Do you wish to proceed?", 4)
    If Response = 7 Then Exit Sub
    For I = LBound(arrNotes) To UBound(arrNotes)
    If Trim(arrNotes(I)) <> "" Then
    X = X + 1
    Set doc = Documents.Add
    doc.Range = arrNotes(I)
    doc.SaveAs ThisDocument.Path & "\" & strFilename & Format(X, "000")
    doc.Close True
    End If
    Next I
    End Sub
    Sub test()
    'delimiter & filename
    SplitNotes "///", "Notes "
    End Sub

    Now, I can see where I need to put the 'delimiter' information. For the document ill be using ',,,,,' after each block of text that has to be split. The issue ive got is, the document is about 2500 pages long, and its going to take me a fair while to sit and split each thing with my delimiter. Is there a quicker way? The documents vary in length so this is making it a little more tricky!

    The other question is, within each specific document there is a title and a description. I want each document to be named with the title and description values, from each specific document.

    Whilst i'd be grateful for the answer, an explanation of how you did it, would also be great! Wanting to learn!

    And yes, ive tried google.


    Thanks folks!


  2. #2
    Microsoft Word MVP 2003-2009 VBAX Guru gmaxey's Avatar
    Joined
    Sep 2005
    Posts
    3,340
    Location
    Maybe the document is already delimited by some other means. See: http://gregmaxey.com/word_tip_pages/..._splitter.html

    If not then you will have to delimit it. Perhaps you could use find and replace to insert the delimiter
    Greg

    Visit my website: http://gregmaxey.com

  3. #3
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location
    Your splitter looks glorious Greg! I shall have a peruse and see if it is delimit'd in anyway! Not that i can clearly see! but find and replace could be my friend!

    Any idea on the naming of the documents and whether it can be done?
    Last edited by leecable; 07-06-2017 at 10:22 AM.

  4. #4
    Whether you can name the document from the contents depends on whether there is something common to each document that can be readily identified.
    Was the large document created from mail merge, as that would make splitting and naming much simpler as mail merge defines each 'document' with a section break, and there is some commonality of layout to aid naming.
    Graham Mayor - MS MVP (Word) 2002-2019
    Visit my web site for more programming tips and ready made processes
    http://www.gmayor.com

  5. #5
    VBAX Tutor
    Joined
    Jul 2016
    Posts
    266
    Location
    This code I pieced together seems to work. It separates based on page breaks. It asks for each file a "save as". I'm sure the professionals here could streamline it.

    Sub Split1()
    Dim arr() As Variant
    Dim i As Byte
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    arr = Array("^b", "^m", "^n")
      For i = LBound(arr) To UBound(arr)
        With Selection.Find
          .Text = arr(i)
          .Replacement.Text = "//kilroy//"
          .Forward = True
          .Wrap = wdFindContinue
        End With
      Selection.Find.Execute Replace:=wdReplaceAll
      Next
    Call Split2
    End Sub
    Sub Split2()
        SplitNotes "//kilroy//"
    End Sub
    Sub SplitNotes(strDelimiter As String)
        Dim oDoc As Document
        Dim arrNotes() As String, arrTitles() As String
        Dim lngIndex As Long, lngNum As Long
        Dim strTitle As String
        Dim oRng As Word.Range
        arrNotes = Split(ActiveDocument.Range, strDelimiter)
        For lngIndex = 0 To UBound(arrNotes)
            ReDim Preserve arrTitles(lngIndex)
            arrTitles(lngIndex) = Right(arrNotes(lngIndex), Len(arrNotes(lngIndex)) - InStrRev(arrNotes(lngIndex), " "))
        Next lngIndex
         
        If MsgBox("This will split the document into " & UBound(arrNotes) + 1 & " sections. Do you wish to proceed?", vbYesNo) = vbNo Then Exit Sub
        For lngIndex = LBound(arrNotes) To UBound(arrNotes)
            If Trim(arrNotes(lngIndex)) <> "" Then
                Set oDoc = Documents.Add
                oDoc.Range = arrNotes(lngIndex)
                On Error Resume Next
                strTitle = arrTitles(lngIndex)
                If Err.Number = 0 Then
                    Set oRng = ActiveDocument.Range
                    oRng.Collapse wdCollapseEnd
                    oRng.MoveStart wdWord, -1
                    oRng.Delete
                    oDoc.SaveAs ThisDocument.Path & "\" & strTitle
                Else
                    lngNum = lngNum + 1
                    oDoc.SaveAs ThisDocument.Path & "\" & "Misc " & Format(lngNum, "000")
                End If
                oDoc.Close True
            End If
        Next lngIndex
    End Sub
    Last edited by Kilroy; 07-07-2017 at 10:57 AM.

  6. #6
    VBAX Tutor
    Joined
    Jul 2016
    Posts
    266
    Location
    Guys I have 2 Questions:

    1. How could I modify the above code to just automatically save the documents created instead of going through the "save as" dialogue? Is keeping them open an option?
    2. Close the original document without saving?

  7. #7
    The code does not display the saveas diallog. It uses the saveas VBA function to save the document
    To close the original document name the document e.g as oSource then close oSource e.g. as follows
    I have not tested your code!

    Sub SplitNotes(strDelimiter As String)
        Dim oDoc As Document
        Dim oSource As Document
        Dim arrNotes() As String, arrTitles() As String
        Dim lngIndex As Long, lngNum As Long
        Dim strTitle As String
        Dim oRng As Word.Range
        
        Set oSource = ActiveDocument
        arrNotes = Split(oSource.Range, strDelimiter)
        For lngIndex = 0 To UBound(arrNotes)
            ReDim Preserve arrTitles(lngIndex)
            arrTitles(lngIndex) = Right(arrNotes(lngIndex), Len(arrNotes(lngIndex)) - InStrRev(arrNotes(lngIndex), " "))
        Next lngIndex
         
        If MsgBox("This will split the document into " & UBound(arrNotes) + 1 & " sections. Do you wish to proceed?", vbYesNo) = vbNo Then Exit Sub
        For lngIndex = LBound(arrNotes) To UBound(arrNotes)
            If Trim(arrNotes(lngIndex)) <> "" Then
                Set oDoc = Documents.Add
                oDoc.Range = arrNotes(lngIndex)
                On Error Resume Next
                strTitle = arrTitles(lngIndex)
                If Err.Number = 0 Then
                    Set oRng = oSource.Range
                    oRng.Collapse wdCollapseEnd
                    oRng.MoveStart wdWord, -1
                    oRng.Delete
                    oDoc.SaveAs ThisDocument.Path & "\" & strTitle
                Else
                    lngNum = lngNum + 1
                    oDoc.SaveAs ThisDocument.Path & "\" & "Misc " & Format(lngNum, "000")
                End If
                oDoc.Close 0
            End If
        Next lngIndex
        oSource.Close 0
    End Sub
    Graham Mayor - MS MVP (Word) 2002-2019
    Visit my web site for more programming tips and ready made processes
    http://www.gmayor.com

  8. #8
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location
    Quote Originally Posted by gmayor View Post
    Whether you can name the document from the contents depends on whether there is something common to each document that can be readily identified.
    Was the large document created from mail merge, as that would make splitting and naming much simpler as mail merge defines each 'document' with a section break, and there is some commonality of layout to aid naming.
    Managed to find and replace to insert a delimiter at the start of each new document, which is sorted. There is something common about all of the documents, they all contain a section of text called 'Title:' Which is identical in every document. Obviously the text after the heading is different in each. So, I would ideally like that to be the title of each saved document.

    So I want it to split at the part where I have inserted the delimiter, and now pull the name of the title as the saved file name. Its got to be possible.


    I have had a look into the code of your splitter -

    With oDoc
    .UpdateStylesOnOpen = False
    NewDocPageSetUp oDoc
    oDoc.SaveAs FileName:=pPath & pFileName & Format(i, "000") & pExt, FileFormat:=lngType
    .Close wdDoNotSaveChanges
    End With

    So I am guessing its to do with this part.

  9. #9
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location
    I have also managed to find this delightful piece of code

    Sub GetRenameFiles() 
        Dim fd As FileDialog 
        Dim strFolder As String 
        Dim strFile As String 
        Dim aDoc As Document 
        Dim rngNewName As Range 
        Set fd = Application.FileDialog(msoFileDialogFolderPicker) 
        With fd 
            .Title = "Select the folder that contains the files." 
            If .Show = -1 Then 
                strFolder = .SelectedItems(1) & "\" 
            Else 
                MsgBox "You did not select a folder." 
                Exit Sub 
            End If 
        End With 
        strFile = Dir$(strFolder & "*.doc*") 
        While strFile <> "" 
            Set aDoc = Documents.Open(strFolder & strFile) 
            Set rngNewName = aDoc.Paragraphs(1).Range 
            rngNewName.MoveEnd wdCharacter, -1 
            aDoc.SaveAs2 rngNewName.Text 
            aDoc.Close 
            strFile = Dir$() 
        Wend 
    End Sub
    Which will basically name them the first line. Can I navigate to a specific line? As the Title is always on the same line on each page?

  10. #10
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location
    Cancel that, after scrolling down the document, it turns out that 'Title:' is not always in the same place within each individual document.

    I'm also having issues with gregs splitter, in that its giving me 86 parts...of a document that's nearly 10000 pages long, this isn't correct

  11. #11
    VBAX Tutor
    Joined
    Jul 2016
    Posts
    266
    Location
    What is separating your documents? Give us a sample of a few pages. delete the content if need be. Just need the beginning and end of a few sections.

  12. #12
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location
    Ive got the documents splits, its just now the Save issue. I now have 800 files starting with notes 001, and so on.

  13. #13
    VBAX Tutor
    Joined
    Jul 2016
    Posts
    266
    Location
    Ok that's great. What is the code you used? I would be interested in seeing it.

  14. #14
    VBAX Regular
    Joined
    Jul 2017
    Posts
    19
    Location
    So I used the original code I found,

    Sub SplitNotes(delim As String, strFilename As String)
    Dim doc As Document
    Dim arrNotes
    Dim I As Long
    Dim X As Long
    Dim Response As Integer
    arrNotes = Split(ActiveDocument.Range, delim)
    Response = MsgBox("This will split the document into " & UBound(arrNotes) + 1 & " sections.Do you wish to proceed?", 4)
    If Response = 7 Then Exit Sub
    For I = LBound(arrNotes) To UBound(arrNotes)
    If Trim(arrNotes(I)) <> "" Then
    X = X + 1
    Set doc = Documents.Add
    doc.Range = arrNotes(I)
    doc.SaveAs ThisDocument.Path & "\" & strFilename & Format(X, "000")
    doc.Close True
    End If
    Next I
    End Sub
    Sub test()
    'delimiter & filename
    SplitNotes "///", "Notes "
    End Sub

    Used the find and replace tool within word to insert my delimiter (delimit) into the document at the appropriate places. Which was swift. This then worked fine in splitting it. The issue is, I now have 800+ documents names Notes001.

  15. #15
    VBAX Tutor
    Joined
    Jul 2016
    Posts
    266
    Location
    what was the common separator for each doc?How did you put in the delimit?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •