PDA

View Full Version : Extract Word Tables into Excel



zedordead
04-17-2013, 05:00 AM
Hi,

I have several .rtf documents that contain thousands of identical tables. I've written something in excel VBA which is able to extract the data from each of these tables out into an excel sheet which works perfectly if I tidy up the word document before running.

The problem is that the relevant tables don't start until several pages through the document (variable), and these leading pages contain tables that are not relevant and don't need extracted. The relevant tables start after heading(14) and end before heading(15).

Is there a way to start the extract from a particular heading or use the heading to find the first and last table index number?

Apologies for my limited knowledge of word objects


Option Explicit

Sub ImportTables()

Dim wdDoc As Object
Dim wdFileName As Variant
Dim tableNo As Integer 'table number in Word
Dim resultRow As Long
Dim tableStart As Integer
Dim tableTot As Integer

On Error Resume Next

ActiveSheet.Range("A2:O1048576").ClearContents

wdFileName = Application.GetOpenFilename("Word files (*.rtf),*.rtf", , _
"Browse for file containing table to be imported")

If wdFileName = False Then Exit Sub '(user cancelled import file browser)

Set wdDoc = GetObject(wdFileName) 'open Word file

With wdDoc
tableNo = wdDoc.Tables.Count
tableTot = wdDoc.Tables.Count
If tableNo = 0 Then
MsgBox "This document contains no tables", _
vbExclamation, "Import Word Table"
End If

Selection.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=1 '4

resultRow = 2

For tableStart = 1 To tableTot
With .Tables(tableStart)
'copy cell contents from Word table cells to Excel cells
Cells(resultRow, 1) = wdFileName 'FILENAME
Cells(resultRow, 2) = WorksheetFunction.Clean(.cell(1, 1).Range.Text)
Cells(resultRow, 3) = WorksheetFunction.Clean(.cell(1, 2).Range.Text)
Cells(resultRow, 4) = WorksheetFunction.Clean(.cell(3, 1).Range.Text)
Cells(resultRow, 5) = WorksheetFunction.Clean(.cell(3, 2).Range.Text)
Cells(resultRow, 6) = WorksheetFunction.Clean(.cell(3, 3).Range.Text)
Cells(resultRow, 7) = WorksheetFunction.Clean(.cell(3, 4).Range.Text)
Cells(resultRow, 8) = WorksheetFunction.Clean(.cell(3, 5).Range.Text)
Cells(resultRow, 9) = WorksheetFunction.Clean(.cell(5, 1).Range.Text)
Cells(resultRow, 10) = WorksheetFunction.Clean(.cell(5, 2).Range.Text)
Cells(resultRow, 11) = WorksheetFunction.Clean(.cell(5, 3).Range.Text)
Cells(resultRow, 12) = WorksheetFunction.Clean(.cell(5, 4).Range.Text)
Cells(resultRow, 13) = WorksheetFunction.Clean(.cell(7, 1).Range.Text)
Cells(resultRow, 14) = WorksheetFunction.Clean(.cell(7, 2).Range.Text)
Cells(resultRow, 15) = WorksheetFunction.Clean(.cell(7, 3).Range.Text)
End With
resultRow = resultRow + 1
Next tableStart
End With

End Sub

Mavila
04-17-2013, 06:40 AM
Can you refer to the table, such as activedocument.tables(x).range.select (where "x" is the first table you are interested in)?

Or execute a "Find" on some particular text that is at the start of the first table you are interested in?

zedordead
04-17-2013, 06:54 AM
I thought of doing something similar, but x would not be constant between documents. I was hoping to be able to define x as the first table following heading 14, but I'm not sure. Would something like this work:


.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=14



Can you refer to the table, such as activedocument.tables(x).range.select (where "x" is the first table you are interested in)?

Or execute a "Find" on some particular text that is at the start of the first table you are interested in?

Mavila
04-17-2013, 07:30 AM
How about something along these lines (Find.Style = "Heading 1"):

Selection.Find.ClearFormatting
Selection.Find.Style = ActiveDocument.Styles("Heading 1")
With Selection.Find
.Text = ""
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute

You could loop through this 14 times?

I recorded the above code.

zedordead
04-17-2013, 09:34 AM
Is it possible to set a range from the start of word document until the heading and then count the number of tables inside that?

I could then use this number of tables + 1 as the starting point in the loop.

fumei
04-17-2013, 05:31 PM
Yes, but you still have to find the 14th iteration of the heading style...assuming your heading styles have been accurately done.

Do rtf files even have styles? I forget. The best way, if it was a .doc file, would be to bookmark the table itself. You could then very easily get to it.

zedordead
04-18-2013, 04:20 AM
Yes, but you still have to find the 14th iteration of the heading style...assuming your heading styles have been accurately done.

Do rtf files even have styles? I forget. The best way, if it was a .doc file, would be to bookmark the table itself. You could then very easily get to it.


Hi fumei, yes rtf have styles, in this case they are all "Heading 1" style and are consistent.

Thanks

macropod
04-22-2013, 04:24 AM
Do the tables of interest have some identifying content (eg heading content) that the other tables don't have?

fumei
04-23-2013, 11:45 PM
You have two choices.

1. the table has something that SPECIFICALLY identifies it. I still think that if you can "tidy up" the documents, then bookmark that specific table. Once bookmarked, it can be move anywhere in a document and you can still refer to it directly, BY NAME.

OR, if there is something specific in the content, then search for it.

Either way, you need something that makes it unique - like a bookmark

2. iterate the number of known times the heading style occurs before the table. Get there, and work with the next table.

Something unique and specific....or slog your way one style at a time.

Question: is this a one-off action, or something you are going to repeat

zedordead
04-24-2013, 02:04 AM
Thanks both for the replies - funmei, I'll need to repeat this many times across many documents, so would rather have something more solid in place.

There is a piece of text that I could use as a bookmark: "Details of messages passed". Following this, there is a count of the number of tables and then the tables begin.

How do I insert a bookmark and begin with the first value of tableStart?

macropod
04-24-2013, 02:20 AM
Given that you have a specific text string (Details of messages passed), you don't need a bookmark. And since, you also have the table count, the programming becomes much easier (assuming the tables don't have other, unwanted tables, intervening). Can you attach a document to a post with some representative data (delete anything sensitive), showing both the "Details of messages passed" text and the following table count - formatted as they appear in the actual documents you're working with? You do this via the paperclip symbol on the 'Go Advanced' tab.

Are you processing just one document at a time, or a whole folder of documents?

As for the output, is a specific Excel file involved, or does the macro create its own? If the macro creates its own output file(s), is that one file per document, or all documents into the same file? If the data all go into one workbook, how is each document's data differentited? Are the data to be formatted in any sense? If so, you will need to provide the details. Does each table go onto the same worksheet, or separate worksheets? If on the same worksheet, are they to go across, by columns or down, by rows and what separates them?

As you can see, there are numerous issues that need to be addressed - and you need to be very specific as to what the requirements are.

zedordead
04-24-2013, 09:25 AM
Hi Macropod, thanks for the help!

I've attached a de-sensitised sample (in doc not rtf) with 5 tables as requested. I'll be processing through 100s of documents within subdirectories, but I can do the recursive dir part already.

The output isn't into a specific excel, so it's okay to create a new document. All the data outputs will go into a single file, with the filename as the unique identifier. If possible, the format would need to match the word tables, but this is not essential.

The table records I need to copy and the locations are:


Cells(resultRow, 1) = wdFileName 'FILENAME
Cells(resultRow, 2) = WorksheetFunction.Clean(.cell(1, 1).Range.Text)
Cells(resultRow, 3) = WorksheetFunction.Clean(.cell(1, 2).Range.Text)
Cells(resultRow, 4) = WorksheetFunction.Clean(.cell(3, 1).Range.Text)
Cells(resultRow, 5) = WorksheetFunction.Clean(.cell(3, 2).Range.Text)
Cells(resultRow, 6) = WorksheetFunction.Clean(.cell(3, 3).Range.Text)
Cells(resultRow, 7) = WorksheetFunction.Clean(.cell(3, 4).Range.Text)
Cells(resultRow, 8) = WorksheetFunction.Clean(.cell(3, 5).Range.Text)
Cells(resultRow, 9) = WorksheetFunction.Clean(.cell(5, 1).Range.Text)
Cells(resultRow, 10) = WorksheetFunction.Clean(.cell(5, 2).Range.Text)
Cells(resultRow, 11) = WorksheetFunction.Clean(.cell(5, 3).Range.Text)
Cells(resultRow, 12) = WorksheetFunction.Clean(.cell(5, 4).Range.Text)

macropod
04-24-2013, 04:03 PM
The code to process a single document is as follows. You will need to replace the 'Set Doc = ActiveDocument' with a reference to the document being processed (eg, by incorporating the code block below 'Process the document' in a loop that draws the document names from your folder recursion).
Sub DataXfer()
Application.ScreenUpdating = True
Dim xlApp As Object, xlWkBk As Object, xlWkSht As Object, xlRow As Long
Dim bStrt As Boolean, Doc As Document, TblRng As Range, CelRng As Range, i As Long, j As Long
' Test whether Excel is already running.
On Error Resume Next
bStrt = False ' Flag to record if we start Excel, so we can close it later.
Set xlApp = GetObject(, "Excel.Application")
'Start Excel if it isn't running
If xlApp Is Nothing Then
Set xlApp = CreateObject("Excel.Application")
If xlApp Is Nothing Then
MsgBox "Can't start Excel.", vbExclamation
Exit Sub
End If
' Record that we've started Excel.
bStrt = True
End If
On Error GoTo 0
With xlApp
'Hide our Excel session
If bStrt = True Then .Visible = False
' create the workbook.
Set xlWkBk = .Workbooks.Add
Set xlWkSht = xlWkBk.Sheets(1)
End With
'Process the document
Set Doc = ActiveDocument
With Doc.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "Details of messages passed."
.Execute
End With
If .Find.Found = True Then
'Populate the workbook
With xlWkSht
xlRow = .Range("A" & .Cells.SpecialCells(11).Row) + 1 '11=xlCellTypeLastCell
.Range("A" & xlRow).Value = Doc.FullName
End With
Set TblRng = .Duplicate.Paragraphs.Last.Range
With TblRng
.Collapse wdCollapseEnd
.End = Doc.Range.End
i = Trim(.Words.First.Text)
For j = 1 To .Tables.Count
If j > j Then Exit For
With .Tables(j)
Set CelRng = .Cell(1, 1).Range
With CelRng
.End = .End - 1
xlWkSht.Range("B" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(1, 2).Range
With CelRng
.End = .End - 1
xlWkSht.Range("C" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(3, 1).Range
With CelRng
.End = .End - 1
xlWkSht.Range("D" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(3, 2).Range
With CelRng
.End = .End - 1
xlWkSht.Range("E" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(3, 3).Range
With CelRng
.End = .End - 1
xlWkSht.Range("F" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(3, 4).Range
With CelRng
.End = .End - 1
xlWkSht.Range("G" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(3, 5).Range
With CelRng
.End = .End - 1
xlWkSht.Range("H" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(5, 1).Range
With CelRng
.End = .End - 1
xlWkSht.Range("I" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(5, 2).Range
With CelRng
.End = .End - 1
xlWkSht.Range("J" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(5, 3).Range
With CelRng
.End = .End - 1
xlWkSht.Range("K" & xlRow).Value = Trim(Left(.Text, 2))
End With
Set CelRng = .Cell(5, 4).Range
With CelRng
.End = .End - 1
xlWkSht.Range("L" & xlRow).Value = Trim(Left(.Text, 2))
End With
End With
Next
End With
End If
End With
'Display the workbook
xlApp.Visible = True
' Release Excel object memory
Set xlWkBk = Nothing: Set xlApp = Nothing
Application.ScreenUpdating = True
End Sub

zedordead
04-29-2013, 03:16 AM
Thanks for the help,

I'm getting an "Type mismatch" error on the following line:

TblRng = .Duplicate.Paragraphs.Last.Range

I've tried hard coding it as Doc.Duplicate.Paragraphs.Last.Range but it's still giving me "Object doesn't support this property or method".

Any ideas?

macropod
04-29-2013, 05:57 AM
I am unable to reproduce that error. Are you running the macro from Word?

I did notice a couple of other issues, however.
Change:
If j > j Then Exit For
to:
If j > i Then Exit For
and insert:
xlRow = xlRow + 1
after:
xlWkSht.Range("L" & xlRow).Value = Trim(Left(.Text, 2))
End With
End With

zedordead
04-29-2013, 06:01 AM
Yeah I spotted that also and added it in already. I was running it from excel but have reverted to using Word and it seems to work fine. Thanks!