Consulting

Results 1 to 7 of 7

Thread: Parsing Chapters within a word Doc and export to to excel

  1. #1
    VBAX Regular
    Joined
    Mar 2012
    Posts
    15
    Location

    Parsing Chapters within a word Doc and export to to excel

    Hi....I thought I had posted my issue here, but as I look through the forum for a solution to my problem but I can't seem to find my post.

    So I'm going to re-post or post for the first time in the hope of finding a solution.

    I am a third year software design student on work placement. I have been given a task to parse 100's of word documents with the following requirements:

    1. Create a Multi list box of chapters, populated by a config file
    2. Parse the word document based on selection - (extract whole chapter)
    3. Export chapter to an excel spread sheet

    I have a number of obstacles here...1. I'm very new to vba (2 weeks) 2. chapter names appear regularly throughout the document, I have to differentiate by heading style. 3. The documents are 20,000+ words long so what I've done thus far is extremely slow. I am working out of excel vba.

    I have posted what I've done so far below. This allows me to select the multilist box, and search for the selected items. It is successful in it's task, however, I need to select all text and tables within that chapter and copy it over to an excel spreadsheet. I can copy to the worksheets within the workbook I'm working out of. So here's my request, I have managed within 2-3 weeks to make some progress, however, from here I seem to be drawing a blank and I've a progress meeting next week and I'm stumped.

    So could someone please show me how to parse the content from the chapters..I would be so so so so grateful

    Oh, there are already bookmarks within the doc and there are hyperlinks, the hyperlinks are in the index, so if you hold ctrl and click it brings you directly to the chapter

    Any help would be really really appreciated!!



    HTML Code:
    '====================================================================
    '               POPULATING LIST BOX WITH DATA IN
    '                      CONFIG WORKSHEET
    '=====================================================================
    Private Sub UserForm_Initialize()
    
    ListBox1.ListFillRange = "Config!A1:A45"
    
    End Sub
    
    '======================================================================
    '                   PROCESSING LISTBOX SELECTION
    '======================================================================
    
    Public Sub Parse_Click()
    
    '======================================================================
    '                       DECLARING VARIABLES
    '======================================================================
    
      Dim i As Long
      Dim C As New Collection
      Dim Path As String
    
      With ListBox1
        For i = 0 To .ListCount - 1
          'Add all selected items to a collection
          If .Selected(i) Then C.Add .List(i)
        Next
      End With
    
      'Nothing selected, nothing to do
      If C.Count = 0 Then Exit Sub
    
      With Application.FileDialog(msoFileDialogFolderPicker)
        .Title = "Select Folder to Process and Click OK"
        .AllowMultiSelect = False
        .InitialView = msoFileDialogViewList
        If .Show <> -1 Then Exit Sub
    
        Path = .SelectedItems(1)
        If Right(Path, 1) <> "\" Then Path = Path + "\"
        'Remove any "
        Path = Replace(Path, """", "")
      End With
    
      If Dir$(Path & "*.doc") = "" Then
        MsgBox "No files found"
        Exit Sub
      End If
      
      On Error GoTo Errorhandler
      ParseDoc Path, C
      Exit Sub
      
    Errorhandler:
      MsgBox "Error " & Err.Number & ": " & Err.Description
    End Sub
    
    '======================================================================
    '                   PARSING WORD DOC FOR
    '                       SELECTED ITEMS
    '======================================================================
    
    Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
      Dim objExcel As Object 'Excel.Application
      Dim ExcelBook As Object 'Excel.Workbook
      Dim WasOpen As Boolean
      Dim oDoc As Document
      Dim oPara As Paragraph
      Dim strFilename As String
      Dim Item
      Dim Rng As Range
      Dim objWord As Word.Application
      Set objWord = New Word.Application
      objWord.Visible = True
        
      'Setting Location of Excel Spread for Parsed Details
      Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"
    
      'Set objWord = New Word.Application
      On Error Resume Next
      WasOpen = True
      Set objExcel = GetObject(, "Excel.Application")
      If objExcel Is Nothing Then
        Set objExcel = CreateObject("Excel.Application")
        If objExcel Is Nothing Then _
          Err.Raise 1000, "ParseDoc", "Excel is not accessible"
        objExcel.Visible = True
        WasOpen = False
      End If
      
      Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
      If ExcelBook Is Nothing Then
        If WasOpen Then objExcel.Quit
        Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
      End If
      On Error GoTo 0
    
      WordBasic.DisableAutoMacros 1
      strFilename = Dir$(strPath & "*.doc")
      While Len(strFilename) <> 0
        Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)
    
            For Each oPara In oDoc.Paragraphs
              For Each Item In Items
                If InStr(1, oPara.Range, Item) > 0 Then
                 If InStr(1, oPara.Style, "H2") > 0 Then
                  oPara.Range.Select
                  MsgBox "You have found the string!"
                  GoTo CloseDoc
                 End If
                End If
              Next
            Next
            
    CloseDoc:
        oDoc.Close wdDoNotSaveChanges
        strFilename = Dir$()
        
      Wend
      WordBasic.DisableAutoMacros 0
      objWord.Quit
      'ExcelBook.Close
      'If WasOpen Then objExcel.Quit
    End Sub
    

  2. #2
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,710
    Location
    Please do not cross-post.

  3. #3
    VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,273
    Location
    Cross-posted at: http://www.tek-tips.com/viewthread.cfm?qid=1677919
    For cross-posting etiquette, please read: http://www.excelguru.ca/content.php?184
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  4. #4
    VBAX Regular
    Joined
    Mar 2012
    Posts
    15
    Location
    Hi

    I do apologize, I didn't realize there etiquette when it came to using technical forums, my assumption was you post to the relevant forums and see what sort of guidance you get. So my apologies for any breach of rules, I wasn't aware of them.

    So what happens now?? It appears I can't post to tek tips to apologize...and I didn't get any sort of assistance on this web site. Am I black listed due to my cross posting ?? I really hope not as I'm a student on placement and I really need assistance...
    Last edited by DJ-DOO; 03-16-2012 at 01:46 AM.

  5. #5
    VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,273
    Location
    AFAIK, you haven't been 'blacklisted' at Tek-Tips, where I've posted some code that you might find helpful.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  6. #6
    VBAX Regular
    Joined
    Mar 2012
    Posts
    15
    Location
    Thank you for your reply macropod. I can't post on tek-tips to thank you. That was quite helpful, however, I've since had to change my approach, the documents I'm working on have a TOC and have hyperlinks to hidden bookmarks. I think the most efficient way to do this is based on selection I go to the relevant bookmark, set the range from this bookmark to the next, then I can select that range, copy and paste.

    That is my logic, unfortunately, I don't know how to implement it

    If you do feel inclined to have a look at this problem with me, I've posted a sample word doc on which I'm working on....


    Thanking you in advance
    Attached Files Attached Files

  7. #7
    VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,273
    Location
    I can't see what the TOC-related bookmarks have to do with extracting the data. What you posted was a need to extract data based on Headings. All a TOC does is to provide a link to those headings.

    FWIW, you can't rely on TOC bookmarks for anything long-term. That's because they're ephemeral and are liable to change IDs anytime you update the TOC.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •