PDA

View Full Version : Loop thru a directory of RTF files, generate a listing



BoatwrenchV8
04-13-2012, 02:17 PM
Hello all,

This macro works, I would like help on making it better.

I have about 150 RTF data files that have been handed me. Each of the filenames have the table number and a short cryptic string explaining what the table is and are saved as an RTF file. Looking at the filename will not give you a clue as to what data the file contains.

Each file contains a table in the header portion of the document with standard / static company data and information, the table number and what specific data the table is showing. The body of the documement contains the data for the study, I am not interested in that data right now or for this operation. The footer of the document contains data as well, but I am not interested in that data right not or for this operation.

My solution to create a listing from the data contained in the table of the header. The macro below was cobbled together from 2 sources and modified myself. It does work and gets the job done BUT I know there are better ways to go about what I am doing. One part I do not understand how it works is the Application.FileDialog(msoFileDialogFolderPicker) part. Beyond that, I am sure objects and ranges can be used to make this macro more efficient. Please advise.

Code is below and a zip file that contains 3 sample source files and an output file are attached.

Thanks,
Rich



Sub ListAllDataTableHeadersV1() '(sPath As String)

Dim sPath As String
' Dim WdDoc As Document
Dim sFile As String
Dim DataSourceDoc As Document 'Data source file to extract data from
Dim WdDocOutputFile As Document 'Output file containing all extracted data
Dim fd As FileDialog 'Declare a variable as a FileDialog object.
Dim vrtSelectedItem As Variant

'Create an output file
Set WdDocOutputFile = New Word.Document

'This part came from:
' http://msdn.microsoft.com/en-us/library/aa432103(v=office.12).aspx
'//////
'
'Create a FileDialog object as a File Picker dialog box.
Set fd = Application.FileDialog(msoFileDialogFolderPicker)
'Declare a variable to contain the path
'of each selected item. Even though the path is aString,
'the variable must be a Variant because For Each...Next
'routines only work with Variants and Objects.

'Use a With...End With block to reference the FileDialog object.
With fd
'Use the Show method to display the File Picker dialog box and return the user's action.
'The user pressed the button.
If .Show = -1 Then
'Step through each string in the FileDialogSelectedItems collection.
For Each vrtSelectedItem In .SelectedItems
'vrtSelectedItem is aString that contains the path of each selected item.
'You can use any file I/O functions that you want to work with this path.
'This example displays the path in a message box.
'MsgBox "The path is: " & vrtSelectedItem
sPath = vrtSelectedItem + "\"
'MsgBox "The path is now: " & sPath
Next vrtSelectedItem
'The user pressed Cancel.
Else
MsgBox "Cancel selected, Macro will terminate."
Set fd = Nothing
Exit Sub
End If
End With
'Set the object variable to Nothing.
Set fd = Nothing

'//////



WdDocOutputFile.Activate

'This part came from posting #3 by lucas at
' http://www.vbaexpress.com/forum/showthread.php?t=13680
' and was modified by me.


'Change the file extension to what you want to find.
'
'I would like to put a file name extension picker here.
' instead of having it hard coded.
'
' sFile = Dir(sPath & "*.doc")
sFile = Dir(sPath & "*.rtf")

'Loop through all .doc files in that path
Do While sFile <> ""

Set DataSourceDoc = Application.Documents.Open(sPath & sFile)

DataSourceDoc.Activate

If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
ActiveWindow.Panes(2).Close
End If
If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
ActivePane.View.Type = wdOutlineView Then
ActiveWindow.ActivePane.View.Type = wdPrintView
End If
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader

'Select and Copy the header in the Data Source
Selection.Tables(1).Select

'Replace the line below with selection.copy instead of
' this line to run my hijacked version of copy.
'Application.Run MacroName:="Normal.HighlighterAppVer1.EditCopy"
Selection.copy 'Standard copy and not my hijacked version

'Close the Data Source
DataSourceDoc.Close wdDoNotSaveChanges

WdDocOutputFile.Activate
Selection.Paste
WdDocOutputFile.Tables(1).Select
Selection.Rows.ConvertToText Separator:=wdSeparateByTabs, NestedTables:= _
True

'clean up the text from the conversion of the table to text

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "(Omega)(*)(Table)"
.Replacement.Text = "\3"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll

With Selection.Find
.Text = "([a-z])(*)(^13)(*)([a-z])"
.Replacement.Text = "\1\2 \4\5"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With

Selection.Find.Execute Replace:=wdReplaceAll

Selection.EndKey Unit:=wdStory
Selection.TypeParagraph

sFile = Dir

Loop

'Save the generated output file.

WdDocOutputFile.Save
WdDocOutputFile.Activate
End Sub

BoatwrenchV8
04-13-2012, 05:50 PM
Will restudy old thread (i have the sample files somewhere that were attached) located at http://www.vbaexpress.com/forum/showthread.php?p=169454#post169454. Am sure a lot of it can be applied to this posting.

BoatwrenchV8
04-15-2012, 07:38 PM
An improvement but still have a way to go.


Sub CreateListOfTablesNumbersAndDesc()
Dim sPath As String
Dim WdDoc As Document
Dim sFile As String
Dim sSourcePath As String
Dim sTargetPath As String
Dim TableNumber As String
Dim Extension As String
Dim WdDocOutputFile As Document

Set WdDocOutputFile = New Word.Document

'Get source path and file extension data, quit if user cancelled.

sSourcePath = GetFilePathName("SOURCE")
If sSourcePath = "USER_CANCELLED" Then Exit Sub

Extension = FileExtension()

sFile = Dir(sSourcePath & "*." & Extension)

'Loop through all .rtf files in that path
Do While sFile <> ""
Set WdDoc = Application.Documents.Open(sSourcePath & sFile)

If ActiveWindow.View.SplitSpecial <> wdPaneNone Then
ActiveWindow.Panes(2).Close
End If
If ActiveWindow.ActivePane.View.Type = wdNormalView Or ActiveWindow. _
ActivePane.View.Type = wdOutlineView Then
ActiveWindow.ActivePane.View.Type = wdPrintView
End If
ActiveWindow.ActivePane.View.SeekView = wdSeekCurrentPageHeader
If Selection.Tables.Count < 1 Then
MsgBox "There isn't a table in the header of the document. Stopping macro."
Exit Sub
End If

'Select and Copy the header in the Data Source
Selection.Tables(1).Select

'Replace the line below with selection.copy instead of
' this line to run my hijacked version of copy.
Application.Run MacroName:="Normal.HighlighterAppVer1.EditCopy"

'Close the Data Source
WdDoc.Close wdDoNotSaveChanges

WdDocOutputFile.Activate
Selection.Paste
WdDocOutputFile.Tables(1).Select
Selection.Rows.ConvertToText Separator:=wdSeparateByTabs, NestedTables:= _
True

'clean up the text from the conversion of the table to text

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "(Omega)(*)(Table)"
.Replacement.Text = "\3"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll

With Selection.Find
.Text = "([a-z])(*)(^13)(*)([a-z])"
.Replacement.Text = "\1\2 \4\5"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With

Selection.Find.Execute Replace:=wdReplaceAll

Selection.EndKey Unit:=wdStory
Selection.TypeParagraph

sFile = Dir

Loop

'Save the generated output file.

WdDocOutputFile.SaveAs2
WdDocOutputFile.Activate
End Sub



Function GetFilePathName(SOURCEorTARGET As String)

Dim fd As FileDialog
Dim ThePathSelected As String
Dim PathCorrect As String

Set fd = Application.FileDialog(msoFileDialogFolderPicker)

Do
With fd
'Use the Show method to display the File
'Picker dialog box and return the user's action.

If .Show = -1 Then 'The user pressed the OK button.
'Step through each string in the FileDialogSelectedItems collection.
ThePathSelected = fd.SelectedItems(1)

PathCorrect = MsgBox("The selected " + SOURCEorTARGET + " path is: " + _
Chr(13) + Chr(13) + ThePathSelected + Chr(13) + Chr(13) + _
"Is this correct?", vbYesNoCancel + vbQuestion)

'Cancel was selected in the yes/no/cancel dialog box
If PathCorrect = vbCancel Then
MsgBox "Cancel was selected, Macro will terminate."
GetFilePathName = "USER_CANCELLED"
Set fd = Nothing
Exit Function
End If
GetFilePathName = ThePathSelected + "\"

Else
'Cancel was selected in the folder dialog
MsgBox "Cancel was selected, Macro will terminate."
GetFilePathName = "USER_CANCELLED"
Set fd = Nothing
Exit Function
End If
End With
Loop While PathCorrect = vbNo
'Set the object variable to Nothing.
Set fd = Nothing
End Function



Function FileExtension(Optional SomethingInTheFutureMaybe As String) As String

Dim sFileEXT As String
Dim bGoodToGo As Boolean

Do
sFileEXT = InputBox("Enter the 3 digit letter file extension for the file(s) to be processed", "File extension")

sFileEXT = Trim(LCase(sFileEXT))
If Len(sFileEXT) = 3 Then
bGoodToGo = sFileEXT Like "[a-z][a-z][a-z]"
End If

Loop Until bGoodToGo = True

FileExtension = sFileEXT

End Function

BoatwrenchV8
04-15-2012, 07:50 PM
I think a major part of the solution is MacroPod's code, only poblem is it will crash if the table in header or footer has rows with a different column count. If the table in the header has 6 rows total and the first 2 rows have 2 columns (2 cells per row), and the third to the 6 th are just one column (one cell), the macro will crash at row 3 because it is looking for a column that does not exist.

Hmmm...

Sub Demo()
Dim oTbl As Table
Dim RngCel As Range
Dim TblTxt As String
Dim i As Integer
Dim j As Integer
With ActiveDocument.Sections(1)
If .Headers(wdHeaderFooterPrimary).Exists Then
For Each oTbl In .Headers(wdHeaderFooterPrimary).Range.Tables
With oTbl
For i = 1 To .Rows.Count
TblTxt = TblTxt & "|"
For j = 1 To .Columns.Count
Set RngCel = .Cell(i, j).Range
RngCel.End = RngCel.End - 1
TblTxt = TblTxt & RngCel.Text & "|"
Next j
TblTxt = TblTxt & vbCrLf
Next i
End With
Next oTbl
End If
If .Footers(wdHeaderFooterPrimary).Exists Then
TblTxt = TblTxt & "-" & vbCrLf
For Each oTbl In .Footers(wdHeaderFooterPrimary).Range.Tables
With oTbl
For i = 1 To .Rows.Count
TblTxt = TblTxt & "|"
For j = 1 To .Columns.Count
Set RngCel = .Cell(i, j).Range
RngCel.End = RngCel.End - 1
TblTxt = TblTxt & RngCel.Text & "|"
Next j
TblTxt = TblTxt & vbCrLf
Next i
End With
Next oTbl
End If
MsgBox TblTxt
End With
End Sub

fumei
04-15-2012, 11:08 PM
I think a major part of the solution is MacroPod's code, only poblem is it will crash if the table in header or footer has rows with a different column count. If the table in the header has 6 rows total and the first 2 rows have 2 columns (2 cells per row), and the third to the 6 th are just one column (one cell), the macro will crash at row 3 because it is looking for a column that does not exist.Correct. VBA does not work on merged cells in tables.

Solution?

DO NOT USE $%#@*&^ merged cells!

I notice you are still using Selection and View. Bleech.

Also, did you read my post regarding using .Exists? There can NEVER be a False to the If Primary .Exists. It always exists. You can get a False for DifferntFirstPage and OddEven, but not Primary.

BoatwrenchV8
04-16-2012, 04:39 PM
DO NOT USE $%#@*&^ merged cells!

That is what is handed to me and I cannot control that. It would make the whole situation much easier if I could specify no merged cells in anything. Problem is, they are in everything I work on - data and final product.


...did you read my post regarding using .Exists? There can NEVER be a False to the If Primary .Exists. It always exists. You can get a False for DifferntFirstPage and OddEven, but not Primary.
Yes, I did read it. Remembered it from your original posting, now I have to research why and how.


I notice you are still using Selection and View. Bleech.


Yeah, I know, I know. Selection and view are what I am used to right now. Range and objects are in the realm of Voodoo and Black Magic to me RIGHT NOW. But I am working on it! Got any hints??

Frosty
04-19-2012, 12:49 PM
BoatwrenchV8: you can look up the help topic "working with ranges" as well as do google searches on Object Oriented Programming (aka OOP). But they are both very big topics and would require many many replies to help you fully learn about them. So that's the hint part.

For this specifically:

1. Merged cells in Word tables are notoriously tricky to code for. I would somewhat disagree with Fumei saying VBA doesn't "work" on merged cells, and rather say that it is "unreliable." This is because the concept of "Rows" and "Columns" becomes VERY nebulous. Again, this could be very long thread just trying to unpack these statements fully. The "right" way to work around this is not to use the .Rows or .Columns objects at all. Instead, cycle through the .Cells collection to get the data you want.

2. Your demo code is cycling through each cell in a row, getting the data, separating with a |, and then "ending" the row with a vblf. Here is an alternate way of doing that...

'----------------------------------------------------------------------------------------------
'Public sub to display table data
'----------------------------------------------------------------------------------------------
Public Sub ShowTableData()
Dim oTable As Table

Set oTable = ActiveDocument.Sections(1).Headers(wdHeaderFooterPrimary).Range.Tables(1)

MsgBox fGetTableData(oTable)
End Sub
'----------------------------------------------------------------------------------------------
'Function to get a parsed out string/variant of the table text
'(likely to break on large tables, since even a Variant can only hold so much data)
'----------------------------------------------------------------------------------------------
Public Function fGetTableData(oTable As Table) As Variant
Dim sRet As String
Dim oCell As Cell
Dim iCol As Integer
Dim iRow As Integer
Dim sCellText As String

For Each oCell In oTable.Range.Cells
'get the text
sCellText = oCell.Range.Text
'remove the cell end characters
sCellText = Replace(sCellText, Chr(13) & Chr(7), "")
'get info about our cell
iCol = oCell.ColumnIndex
iRow = oCell.RowIndex
'don't bother showing empty cell data?
If sCellText <> "" Then
'if we're not at the last cell of the table
If Not oCell.Next Is Nothing Then
'if the next cell is in the same row index, just separate with a pipe
If oCell.RowIndex = oCell.Next.RowIndex Then
sRet = sRet & sCellText & "|"
'otherwise, put in a line break
Else
sRet = sRet & sCellText & vbLf
End If
'last cell
Else
sRet = sRet & sCellText
End If
End If
'increment our counter
iRow = oCell.RowIndex
Next
fGetTableData = sRet
End Function

fumei
04-19-2012, 02:08 PM
Sorry, I should have been more specific. VBA does not work (at all) with vertical merged cells, but you can kludge things to work with horizontally merged cells.

Frosty
04-19-2012, 02:49 PM
Fumei,

The code I wrote will work on vertically merged cells as well.

Frosty
04-19-2012, 03:07 PM
Quick update: Removed an extra line of code which didn't need to be there. Verified this works on both Word 2010 and Word 2003, but none of this is particularly version specific (unless there is something quirky about Word XP which Fumei can identify), this approach *should* always "work" (in that it will give you all the data in your table, separating "rows" of info by a line feed character, and non-blank cells of data by a "|" character), but with tables with a lot of funky merged cells, you will still have a garbage-in-garbage-out problem in that the .ColumnIndex and .RowIndex properties are still not reliable depending on how the table is constructed (and, additionally, depending on how you have selected the table).

In short-- Tables are very tricky, and relying on positional data in a non-standard table absolutely will cause coding issues.

'----------------------------------------------------------------------------------------------
'Public sub to display table data
'----------------------------------------------------------------------------------------------
Public Sub ShowTableData()
Dim oTable As Table

Set oTable = ActiveDocument.Tables(1)

MsgBox fGetTableData(oTable)
End Sub
'----------------------------------------------------------------------------------------------
'Function to get a parsed out string/variant of the table text
'(likely to break on large tables, since even a Variant can only hold so much data)
'----------------------------------------------------------------------------------------------
Public Function fGetTableData(oTable As Table) As Variant
Dim sRet As String
Dim oCell As Cell
Dim iCol As Integer
Dim iRow As Integer
Dim sCellText As String

For Each oCell In oTable.Range.Cells
'get the text
sCellText = oCell.Range.Text
'remove the cell end characters
sCellText = Replace(sCellText, Chr(13) & Chr(7), "")
'get info about our cell
iCol = oCell.ColumnIndex
iRow = oCell.RowIndex
'don't bother showing empty cell data?
If sCellText <> "" Then
'if we're not at the last cell of the table
If Not oCell.Next Is Nothing Then
'if the next cell is in the same row index, just separate with a pipe
If oCell.RowIndex = oCell.Next.RowIndex Then
sRet = sRet & sCellText & "|"
'otherwise, put in a line break
Else
sRet = sRet & sCellText & vbLf
End If
'last cell
Else
sRet = sRet & sCellText
End If
End If
Next
fGetTableData = sRet
End Function

fumei
04-19-2012, 08:19 PM
Ah, I finally see what you are doing (after finally actually running it!). You make a single string.

So, Boatwrench, is this code getting the data you want? Is a single string getting what you want? I am still a bit unclear as to what data you DO want to get out of these header tables. I am not sure it does.

fumei
04-19-2012, 08:30 PM
For example, you have:

Table 3 Listing of people who drooled all over themselves for more than
one hour after trying “Everlasting” bubble gum

These "lines" are in two separate cells and in Frosty's code end up separated by a paragraph mark - even though they seem to me to be a single sentence.

Oh and Jason, your second code returns a 438 error for me. Object does not supports this property or method.

fumei
04-19-2012, 08:51 PM
Oh, and IF you want:

Table 1 Listing of test subjects enrolled to test the “Everlasting” bubble gum flavoring

Table 2 Listing of adverse events of people turning different color after testing “Everlasting” bubble gum flavor

Table 3 Listing of people who drooled all over themselves for more than one hour after trying “Everlasting” bubble gum

and these are ALWAYS in the LAST TWO cells...
Sub GetText()
Dim j As Long
Dim sRet As String

j = ActiveDocument.Sections(1).Headers(1).Range.Tables(1). _
Range.Cells.Count
sRet = ActiveDocument.Sections(1).Headers(1).Range.Tables(1). _
Range.Cells(j - 1).Range.Text & _
ActiveDocument.Sections(1).Headers(1).Range.Tables(1). _
Range.Cells(j).Range.Text
' replace end-of-cell marker (the weird double-character character) with a space
sRet = Replace(sRet, Chr(13) & Chr(7), " ")
MsgBox sRet
End Subwill return THAT text for each of the demo files you posted.

E.g. Table 1 Listing of test subjects enrolled to test the “Everlasting” bubble gum flavoring

fumei
04-19-2012, 09:12 PM
So what - exactly - do you want to do with something like:

Table 1 Listing of test subjects enrolled to test the “Everlasting” bubble gum flavoring

fumei
04-19-2012, 09:32 PM
As an aside, the following function returns the text of a table, with each cell separated by a paragraph mark. It essentially replicates TableToText.
Function MyTableToText() As String
' gets table contents as string
' each cell separated by paragraph mark
Dim aRow As Row
Dim aCell As Cell
Dim sRet As String
For Each aRow In ActiveDocument.Tables(1).Rows
For Each aCell In aRow.Cells
sRet = sRet & aCell.Range.Text
sRet = Replace(sRet, Chr(13) & Chr(7), vbCrLf)
Next
Next
MyTableToText = sRet
End Function
As an example of use, you could grab the text from a table (in the function it is Table(1), but it could be any table, or even a table object in a different document), put that text somewhere else, and then delete the table. Thus replacing a TableToText, with the advantage it can put the TableToText ANYWHERE - including a different document. A normal TableToText only works in the current document, and only at the location of the table.

And of course it can be altered to use Tabs (or anything else...other text for example), not just paragraph marks.In other words you could break out table cell text separated by ANYTHING, ANYWHERE.

Frosty
04-19-2012, 09:39 PM
Fumei:

My second code (for testing purposes) was simply using a table in the document, rather than specifically in a header.range. That's why you got the error if you were running it on one of the OP's original documents (which only contained tables in the headers/footers).

Just a quick note on your MyTableToText function-- that will break on a table with vertically merged cells, since you're using the .Rows collection (maybe that's why you said you can only work with horizontally merged rows? Because this is one of your standard functions?).

But yes, my code was basically just a proof-of-concept and not necessarily plug-and-play ready for the original code (I was basing it on the Demo code above, not the actual code the OP said works and just needs improvement).

fumei
04-19-2012, 10:52 PM
Yes, I have used this often, and no, I never use vertically merged cells. Good point to mention though. Thanks.

BoatwrenchV8
04-20-2012, 05:56 PM
Frosty and Fumei,
Thank you both fort the time you spent on this. I have a lot of experimenting to do when I get finished with the project I am involved with, which is why I haven't posted in a while. It is very likely there will be another data run made soon and as long as the data is set up exactly like the previous run, all will be fine. We all know that won't happen. Lol. Thanks again and when I get all the fires out, I will be post my findings.

fumei
04-20-2012, 10:05 PM
A really good place to start is to describe EXACTLY what you want to happen. I have no idea actually.