PDA

View Full Version : Parsing Chapters within a word Doc and export to to excel



DJ-DOO
03-15-2012, 04:43 AM
Hi....I thought I had posted my issue here, but as I look through the forum for a solution to my problem but I can't seem to find my post. :dunno

So I'm going to re-post or post for the first time in the hope of finding a solution.

I am a third year software design student on work placement. I have been given a task to parse 100's of word documents with the following requirements:

1. Create a Multi list box of chapters, populated by a config file
2. Parse the word document based on selection - (extract whole chapter)
3. Export chapter to an excel spread sheet

I have a number of obstacles here...1. I'm very new to vba (2 weeks) 2. chapter names appear regularly throughout the document, I have to differentiate by heading style. 3. The documents are 20,000+ words long so what I've done thus far is extremely slow. I am working out of excel vba.

I have posted what I've done so far below. This allows me to select the multilist box, and search for the selected items. It is successful in it's task, however, I need to select all text and tables within that chapter and copy it over to an excel spreadsheet. I can copy to the worksheets within the workbook I'm working out of. So here's my request, I have managed within 2-3 weeks to make some progress, however, from here I seem to be drawing a blank and I've a progress meeting next week and I'm stumped. :banghead:

So could someone please show me how to parse the content from the chapters..I would be so so so so grateful : pray2:

Oh, there are already bookmarks within the doc and there are hyperlinks, the hyperlinks are in the index, so if you hold ctrl and click it brings you directly to the chapter

Any help would be really really appreciated!!

:help




'====================================================================
' POPULATING LIST BOX WITH DATA IN
' CONFIG WORKSHEET
'=====================================================================
Private Sub UserForm_Initialize()

ListBox1.ListFillRange = "Config!A1:A45"

End Sub

'======================================================================
' PROCESSING LISTBOX SELECTION
'======================================================================

Public Sub Parse_Click()

'======================================================================
' DECLARING VARIABLES
'======================================================================

Dim i As Long
Dim C As New Collection
Dim Path As String

With ListBox1
For i = 0 To .ListCount - 1
'Add all selected items to a collection
If .Selected(i) Then C.Add .List(i)
Next
End With

'Nothing selected, nothing to do
If C.Count = 0 Then Exit Sub

With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select Folder to Process and Click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then Exit Sub

Path = .SelectedItems(1)
If Right(Path, 1) <> "\" Then Path = Path + "\"
'Remove any "
Path = Replace(Path, """", "")
End With

If Dir$(Path & "*.doc") = "" Then
MsgBox "No files found"
Exit Sub
End If

On Error GoTo Errorhandler
ParseDoc Path, C
Exit Sub

Errorhandler:
MsgBox "Error " & Err.Number & ": " & Err.Description
End Sub

'======================================================================
' PARSING WORD DOC FOR
' SELECTED ITEMS
'======================================================================

Public Sub ParseDoc(ByVal strPath As String, ByVal Items As Collection)
Dim objExcel As Object 'Excel.Application
Dim ExcelBook As Object 'Excel.Workbook
Dim WasOpen As Boolean
Dim oDoc As Document
Dim oPara As Paragraph
Dim strFilename As String
Dim Item
Dim Rng As Range
Dim objWord As Word.Application
Set objWord = New Word.Application
objWord.Visible = True

'Setting Location of Excel Spread for Parsed Details
Const WorkBookName As String = "C:\Users\edoogar\Documents\ParseProject\ParseDetails.xls"

'Set objWord = New Word.Application
On Error Resume Next
WasOpen = True
Set objExcel = GetObject(, "Excel.Application")
If objExcel Is Nothing Then
Set objExcel = CreateObject("Excel.Application")
If objExcel Is Nothing Then _
Err.Raise 1000, "ParseDoc", "Excel is not accessible"
objExcel.Visible = True
WasOpen = False
End If

Set ExcelBook = objExcel.Workbooks.Open(Filename:=WorkBookName)
If ExcelBook Is Nothing Then
If WasOpen Then objExcel.Quit
Err.Raise 1001, "ParseDoc", "Can not open " & WorkBookName
End If
On Error GoTo 0

WordBasic.DisableAutoMacros 1
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
Set oDoc = objWord.Documents.Open(Filename:=strPath & strFilename, AddToRecentFiles:=False)

For Each oPara In oDoc.Paragraphs
For Each Item In Items
If InStr(1, oPara.Range, Item) > 0 Then
If InStr(1, oPara.Style, "H2") > 0 Then
oPara.Range.Select
MsgBox "You have found the string!"
GoTo CloseDoc
End If
End If
Next
Next

CloseDoc:
oDoc.Close wdDoNotSaveChanges
strFilename = Dir$()

Wend
WordBasic.DisableAutoMacros 0
objWord.Quit
'ExcelBook.Close
'If WasOpen Then objExcel.Quit
End Sub

fumei
03-15-2012, 05:28 PM
Please do not cross-post.

macropod
03-15-2012, 11:40 PM
Cross-posted at: http://www.tek-tips.com/viewthread.cfm?qid=1677919
For cross-posting etiquette, please read: http://www.excelguru.ca/content.php?184

DJ-DOO
03-16-2012, 01:28 AM
Hi

I do apologize, I didn't realize there etiquette when it came to using technical forums, my assumption was you post to the relevant forums and see what sort of guidance you get. So my apologies for any breach of rules, I wasn't aware of them.

So what happens now?? It appears I can't post to tek tips to apologize...and I didn't get any sort of assistance on this web site. Am I black listed due to my cross posting ?? I really hope not as I'm a student on placement and I really need assistance...

macropod
03-16-2012, 07:34 PM
AFAIK, you haven't been 'blacklisted' at Tek-Tips, where I've posted some code that you might find helpful.

DJ-DOO
03-21-2012, 03:03 AM
Thank you for your reply macropod. I can't post on tek-tips to thank you.:dunno That was quite helpful, however, I've since had to change my approach, the documents I'm working on have a TOC and have hyperlinks to hidden bookmarks. I think the most efficient way to do this is based on selection I go to the relevant bookmark, set the range from this bookmark to the next, then I can select that range, copy and paste.

That is my logic, unfortunately, I don't know how to implement it :think:

If you do feel inclined to have a look at this problem with me, I've posted a sample word doc on which I'm working on....


Thanking you in advance

macropod
03-21-2012, 02:38 PM
I can't see what the TOC-related bookmarks have to do with extracting the data. What you posted was a need to extract data based on Headings. All a TOC does is to provide a link to those headings.

FWIW, you can't rely on TOC bookmarks for anything long-term. That's because they're ephemeral and are liable to change IDs anytime you update the TOC.