PDA

View Full Version : Request for review



gmaxey
02-13-2011, 03:51 PM
I am positg alot of code here so hopefully the issues will be relatively clear to anyone interested in helping.

The code is run from a Word template with the purpose of doing the following:

1. Let user identifie a folder contain and batch of Word documents to process.
2. Open each document in the batch folder in turn and extract any qualified content control contents to a EXCEL database.
Qualified content controls are any content control (except group controls) that are assigne a title property.
3. Close the document and move it to a separate "Processed" folder.

The Word part, while far from perfect I am sure, I am pretty confident about. The Excel piece is a different story. I really don't know anything about programing Excel and all that is here is bits and pieces that I have stumbled on in Google search or just plain stumbled on with trial and error.

I start out with a empty Exel file "Extracted CC Data.xlsx"

As each document is opened a function collects relevent CC Data. This data is then used to to create a heading row, but I don't really think it is a heading row so more accurately, .... then used to define the first row column text. A column is defined for each unique title in the document. If two or more CCs share the same title then only one column with that title is created.

Next the value of each CC is placed in the correct colomn of the sheet.

Then the document is closed and the next docoment is opened. If there are different titled CCs in this document then additional columns are defined as appropriate.

It seems to be working.

Now later on users may add addtional files to the batch folder. So when the code is run they are givd an option to clear the data in the spreadsheet. I didn't know any way to do this other than define a range consisting of the entire sheet and then detele it:

If MsgBox("Do you want to clear stored data presently in the database?", vbQuestion + vbYesNo, "CLEAR DATABASE") = vbYes Then
oSheet.Range("A1", "XFD1048576").Clear
End If


Is there a better way?

You can see that I have resorted to using loops and matching conditions to determing if a column already exists. Is there a better way?

I have also used loops and matching conditions to determing which column the CC data should be placed in. Is there a better way?

One last question that deals with arrays. To get the CC data I am using a Function that returns an array of strings. If there are no qualified CCs in the document this array would be empty. I haven't been able to figure out a way to handle this situation in the code ohter than assigning a unique string to one of the array elements and using it as a psuedo "Empty" conditon. Any suggestions to improve that process or any of the process is welcomed!!

Thanks.
Option Explicit
Dim oDoc As Word.Document
Dim oCC As ContentControl
Sub ExtractDataFromDocumentCCsExportToExcel()
'Used for late binding
Dim oXL As Object
Dim oWB As Object
Dim oSheet As Object
'Used for early binding with reference to Excel Object Model
'Dim oXL As Excel.Application
'Dim oWB As Excel.Workbook
'Dim oSheet As Excel.Worksheet
Dim oExcelDataFile As String
Dim oWBName As String
Dim bAppRunning As Boolean
Dim bWBFileOpen As Boolean
Dim LastRow As Long
Dim LastCol As Long
Dim i As Long, j As Long, k As Long
Dim pPath As String
Dim pFileName As String
Dim arrFiles() As String
Dim arrCCData() As String
Dim bProcessCCs As Boolean
Dim FiletoKill As String
pPath = GetPathToUse
If pPath = "" Then
MsgBox "A folder was not selected"
Exit Sub
End If
CreateProcessedDirectory pPath
'Identify Word Document files in folder to process.
pFileName = Dir$(pPath & "*.doc")
ReDim arrFiles(1 To 1000) 'A number larger than the expected number of Word files in folder to process
'Add file name to the array
Do While pFileName <> ""
i = i + 1
arrFiles(i) = pFileName
'Get the next file name
pFileName = Dir$
Loop
If i = 0 Then
MsgBox "The selected folder did not contain any forms to process."
Exit Sub
End If
'Resize and preserve the array
ReDim Preserve arrFiles(1 To i)
Application.ScreenUpdating = False
'Define the Workbook file and path
oExcelDataFile = "D:\Batch\Tally Data Forms\Extracted CC Data.xlsx"
'Extract the file name name
oWBName = oExcelDataFile
While InStr(oWBName, "\") <> 0
oWBName = Right(oWBName, Len(oWBName) - InStr(oWBName, "\"))
Wend
'Check if Excel is installed and already running. If not then start Excel
On Error Resume Next
Set oXL = GetObject(, "Excel.Application")
On Error GoTo 0
If Not oXL Is Nothing Then
bAppRunning = True
Else
On Error Resume Next
Set oXL = CreateObject("Excel.Application")
On Error GoTo 0
If oXL Is Nothing Then
MsgBox "Excel not installed"
Exit Sub
End If
End If
On Error GoTo Err_Handler1
'Check if target workbook is already opened. If not, open the workbook
If IsFileOpen(oExcelDataFile) Then
bWBFileOpen = True
Set oWB = oXL.Workbooks(oWBName)
Else
Set oWB = oXL.Workbooks.Open(filename:=oExcelDataFile)
End If
Set oSheet = oWB.Sheets("Sheet1")
If MsgBox("Do you want to clear stored data presently in the database?", vbQuestion + vbYesNo, "CLEAR DATABASE") = vbYes Then
oSheet.Range("A1", "XFD1048576").Clear
End If
For i = 1 To UBound(arrFiles)
Set oDoc = Documents.Open(filename:=pPath & arrFiles(i), Visible:=False)
'Identify the file to remove from batch folder after processing
FiletoKill = pPath & oDoc
'Collect data from content controls in document to process.
bProcessCCs = True
arrCCData = DocCCData
If arrCCData(0, 2) = "***DOCUMENT DOES NOT CONTAIN ANY VALID CONTENT CONTROLS***" Then bProcessCCs = False
If bProcessCCs Then
'Add/ensure a column exists for each content control title. Attempts to add duplicate column is handled by error handler.
On Error Resume Next
oSheet.Cells(1, 1) = "Record Number"
For j = 0 To UBound(arrCCData)
Dim bColExists As Boolean
bColExists = False
LastCol = oSheet.Cells(1, oSheet.Columns.Count).End(-4159).Column 'xlToLeft
For k = 1 To LastCol
If oSheet.Cells(1, k).Value = arrCCData(j, 0) Then
bColExists = True
Exit For
End If
Next k
If Not bColExists Then oSheet.Cells(1, LastCol + 1) = arrCCData(j, 0)
'oSheet.Cells(1, j + 2) = arrCCData(j, 0)
Next j
On Error GoTo 0
End If
With oDoc
LastRow = oSheet.Cells(oSheet.Rows.Count, "A").End(-4162).Row 'xlUp
LastCol = oSheet.Cells(1, oSheet.Columns.Count).End(-4159).Column 'xlToLeft
oSheet.Cells(LastRow + 1, 1).Value = LastRow '+ 1 'Counter
If bProcessCCs Then
For j = 0 To UBound(arrCCData)
For k = 1 To LastCol
If oSheet.Cells(1, k).Value = arrCCData(j, 0) Then Exit For
Next k
oSheet.Cells(LastRow + 1, k).Value = arrCCData(j, 1)
Next j
End If
'Save processed file in Processed folder
.SaveAs pPath & "Processed\" & .Name
.Close
Kill FiletoKill 'Delete file from the batch folder
End With
Next i
If bWBFileOpen Then
If MsgBox("Do you want to save changes to the spreadsheet?", vbYesNo, "Save") = vbYes Then
oWB.Save
End If
Else
oWB.Close SaveChanges:=True
End If
'Clean up.
Set oSheet = Nothing
Set oWB = Nothing
If Not bAppRunning Then
oXL.Quit
End If
Set oXL = Nothing
Set oDoc = Nothing
Set oCC = Nothing
Application.ScreenUpdating = True
Exit Sub
Err_Handler1:
MsgBox oExcelDataFile & " caused a problem. " & Err.Description, vbCritical, "Error: " _
& Err.Number
If Not bAppRunning Then
oXL.Quit
End If
End Sub
Function IsFileOpen(pFileFullName As String)
Dim lngIndex As Long
lngIndex = FreeFile()
On Error GoTo Err_Handler
'Attempt to open the file and lock it
Open pFileFullName For Input Lock Read As lngIndex
Close lngIndex
IsFileOpen = False
Exit Function
Err_Handler:
Select Case Err.Number
'Error number for "Permission Denied." File is already opened.
Case 70
IsFileOpen = True
Case Else
'Raise error. File doesn't exist.
Error Err.Number
End Select
End Function



Private Function GetPathToUse() As Variant
Dim fDialog As FileDialog
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
.Title = "Select Folder containing the completed form documents to and click OK"
.AllowMultiSelect = False
.InitialView = msoFileDialogViewList
If .Show <> -1 Then
GetPathToUse = ""
Set fDialog = Nothing
Exit Function
End If
GetPathToUse = fDialog.SelectedItems.Item(1)
If Right(GetPathToUse, 1) <> "\" Then GetPathToUse = GetPathToUse + "\"
End With
End Function
Sub CreateProcessedDirectory(pPath As String)
'Requires reference to Microsoft Scripting Runtime
Dim Path As String
Dim fso As FileSystemObject
Path = pPath
Dim NewDir As String
Set fso = CreateObject("Scripting.FileSystemObject")
NewDir = Path & "Processed"
If Not fso.FolderExists(NewDir) Then
fso.CreateFolder NewDir
End If
End Sub
Function DocCCData() As String()
Dim arrX() As String
Dim lngValidator As Long
Dim rngStory As Word.Range
Dim oCount As Long
Dim oShp As Word.Shape
Dim x As Long
lngValidator = ActiveDocument.Sections(1).Headers(1).Range.StoryType
'Get count of titled CCs of valid type in document
For Each rngStory In ActiveDocument.StoryRanges
'Iterate through all linked stories
Select Case rngStory.StoryType
Case 1 To 11
Do
On Error Resume Next
For Each oCC In rngStory.ContentControls
Select Case oCC.Type
Case 7 'Do not include Group yype controls
Case Else
If StrPtr(oCC.Title) <> 0 Then
oCount = oCount + 1
End If
End Select
Next oCC
Select Case rngStory.StoryType
Case 6, 7, 8, 9, 10, 11
If rngStory.ShapeRange.Count > 0 Then
For Each oShp In rngStory.ShapeRange
If oShp.TextFrame.HasText Then
For Each oCC In oShp.TextFrame.TextRange.ContentControls
Select Case oCC.Type
Case 7 'Do not include Group yype controls
Case Else
If StrPtr(oCC.Title) <> 0 Then
oCount = oCount + 1
End If
End Select
Next oCC
End If
Next oShp
End If
Case Else
'Do Nothing
End Select
On Error GoTo 0
'Get next linked story (if any)
Set rngStory = rngStory.NextStoryRange
Loop Until rngStory Is Nothing
Case Else
End Select
Next
ReDim arrX(0, 2)
If oCount > 0 Then
x = 0
ReDim arrX(oCount - 1, 2)
For Each rngStory In ActiveDocument.StoryRanges
'Iterate through all linked stories
Select Case rngStory.StoryType
Case 1 To 11
Do
For Each oCC In rngStory.ContentControls
Select Case oCC.Type
Case 7 'Do not include Group type controls
Case Else
If StrPtr(oCC.Title) <> 0 Then
arrX(x, 0) = oCC.Title
arrX(x, 2) = oCC.ShowingPlaceholderText
Select Case oCC.Type
Case wdContentControlPicture
On Error Resume Next
arrX(x, 1) = oCC.Range.InlineShapes(1).LinkFormat.SourceFullName
If Err.Number <> 0 Then arrX(x, 1) = "Empty\Unlinked Image"
On Error GoTo 0
Case Else
If Not oCC.ShowingPlaceholderText Then
arrX(x, 1) = oCC.Range.Text
Else
arrX(x, 1) = ""
End If
End Select
x = x + 1
End If
End Select
Next oCC
Select Case rngStory.StoryType
Case 6, 7, 8, 9, 10, 11
If rngStory.ShapeRange.Count > 0 Then
For Each oShp In rngStory.ShapeRange
If oShp.TextFrame.HasText Then
For Each oCC In oShp.TextFrame.TextRange.ContentControls
Select Case oCC.Type
Case 7 'Do not include Group type controls
Case Else
If StrPtr(oCC.Title) <> 0 Then
arrX(x, 0) = oCC.Title
arrX(x, 2) = oCC.ShowingPlaceholderText
Select Case oCC.Type
Case wdContentControlPicture
On Error Resume Next
arrX(x, 1) = oCC.Range.InlineShapes(1).LinkFormat.SourceFullName
If Err.Number <> 0 Then arrX(x, 1) = "Empty\Unlinked Image"
On Error GoTo 0
Case Else
If Not oCC.ShowingPlaceholderText Then
arrX(x, 1) = oCC.Range.Text
Else
arrX(x, 1) = ""
End If
End Select
x = x + 1
End If
End Select
Next oCC
End If
Next oShp
End If
Case Else
'Do Nothing
End Select
On Error GoTo 0
'Get next linked story (if any)
Set rngStory = rngStory.NextStoryRange
Loop Until rngStory Is Nothing
Case Else
End Select
Next rngStory
Else
arrX(0, 2) = "***DOCUMENT DOES NOT CONTAIN ANY VALID CONTENT CONTROLS***"
End If
DocCCData = arrX
Set oCC = Nothing
End Function

Bob Phillips
02-13-2011, 04:16 PM
Now later on users may add addtional files to the batch folder. So when the code is run they are givd an option to clear the data in the spreadsheet. I didn't know any way to do this other than define a range consisting of the entire sheet and then detele it:

If MsgBox("Do you want to clear stored data presently in the database?", vbQuestion + vbYesNo, "CLEAR DATABASE") = vbYes Then
oSheet.Range("A1", "XFD1048576").Clear
End If


Is there a better way?

Yes, Excel has a UsedRange property which defines all cells used, so you can use this.

f MsgBox("Do you want to clear stored data presently in the database?", vbQuestion + vbYesNo, "CLEAR DATABASE") = vbYes Then
oSheet.UsedRange.Clear
End If


You can see that I have resorted to using loops and matching conditions to determing if a column already exists. Is there a better way?

I have also used loops and matching conditions to determining which column the CC data should be placed in. Is there a better way?

You can use Excel functions within your code to check this, for instance



If IsError(Application.Match("column name", Rows(1),0)) Then

MsgBox "Value not in headings"
Else

MsgBox "Value in headings"
End If



One last question that deals with arrays. To get the CC data I am using a Function that returns an array of strings. If there are no qualified CCs in the document this array would be empty. I haven't been able to figure out a way to handle this situation in the code other than assigning a unique string to one of the array elements and using it as a psuedo "Empty" condition. Any suggestions to improve that process or any of the process is welcomed!!


Surely, you control when the array is added to, so you could easily set a Boolean at this point.

If not, this function can test an arry being populated or not



Function IsArrayAllocated(Arr As Variant) As Boolean
On Error Resume Next
IsArrayAllocated = Not (IsError(LBound(Arr))) And _
IsArray(Arr) And _
(LBound(Arr) <= UBound(Arr))
End Function

gmaxey
02-13-2011, 04:26 PM
xld,

Thanks for your reply. It might take me a while to get back to the code and try these suggestions, but they look like they are on mark. Thank you again.

gmaxey
02-13-2011, 06:25 PM
xld,

The "UsedRange" works great.

As I mentioned I am trying am running this code from Word and I am trying to use late binding without adding a reference to the Excel Object Library.

When I try using:

For j = 0 To UBound(arrCCData)
LastCol = oSheet.Cells(1, oSheet.Columns.Count).End(-4159).Column 'xlToLeft
If IsError(oXL.Match(arrCCData(j, 0), Rows(1), 0)) Then
oSheet.Cells(1, LastCol + 1) = arrCCData(j, 0)
End If
' Dim bColExists As Boolean
' bColExists = False
' For k = 1 To LastCol
' If oSheet.Cells(1, k).Value = arrCCData(j, 0) Then
' bColExists = True
' Exit For
' End If
' Next k
' If Not bColExists Then oSheet.Cells(1, LastCol + 1) = arrCCData(j, 0)


The compiler throws an error on "Rows"

Sub or Function not defined. I have seen someplace before that adding "#" prior to the "IF" and "End If" is supposed to bypass those statements during complile. I have never used them but attemps now results in new errors"

arrCCData "varaible not defined"

If I replace that for testing with just a string then

Rows "variable not defined"

Is there a way to use this technique with late binding? Thanks.




Yes, Excel has a UsedRange property which defines all cells used, so you can use this.

f MsgBox("Do you want to clear stored data presently in the database?", vbQuestion + vbYesNo, "CLEAR DATABASE") = vbYes Then
oSheet.UsedRange.Clear
End If



You can use Excel functions within your code to check this, for instance



If IsError(Application.Match("column name", Rows(1),0)) Then

MsgBox "Value not in headings"
Else

MsgBox "Value in headings"
End If




Surely, you control when the array is added to, so you could easily set a Boolean at this point.

If not, this function can test an arry being populated or not



Function IsArrayAllocated(Arr As Variant) As Boolean
On Error Resume Next
IsArrayAllocated = Not (IsError(LBound(Arr))) And _
IsArray(Arr) And _
(LBound(Arr) <= UBound(Arr))
End Function

Tinbendr
02-13-2011, 06:31 PM
(Don't mean to step on Bob's toes.)

You'll have to qualify it.

Before and after.
If IsError(oXL.Match(arrCCData(j, 0), Rows(1), 0)) Then
If IsError(oXL.Match(arrCCData(j, 0), oSheets.Rows(1), 0)) Then
David

gmaxey
02-13-2011, 09:40 PM
Doh!!

That worked great. Thanks.

One more question. That process told me if the CC Title matched "any" of the Row 1 column heads. I am still using a loop to see which Row 1 column head matches the CC Title:

For j = 0 To UBound(arrCCData)
For k = 1 To LastCol
'Find the heading row column text that matches the CC title data
If oSheet.Cells(1, k).Value = arrCCData(j, 0) Then Exit For
Next k
'Put the CC text data in the that column
oSheet.Cells(LastRow + 1, k).Value = arrCCData(j, 1)
Next j


Is there an adaptation of "Match" that would return an index of which column was matched? Thanks.

gmaxey
02-13-2011, 10:15 PM
Answering my own question here. I must be learning something.

For j = 0 To UBound(arrCCData)
'Put the CC text data in the column with a heading that matches the CC title data
oSheet.Cells(LastRow + 1, oXL.Match(arrCCData(j, 0), oSheet.Rows(1), 0)).Value = arrCCData(j, 1)
Next j


Thanks again for all your help.

Shred Dude
02-13-2011, 10:24 PM
oXL.Match(arrCCData(j, 0), oSheets.Rows(1), 0)

by itself should tell you the column number within row 1 that you had a match. So you can combine that with the INDEX function to return the matching text in that column...

oXL.index(oxl.rows(1),,oXL.Match(arrCCData(j, 0), oSheets.Rows(1), 0))

Note the two commas before the Match function as you are macthing on a column within the indexed array (rows(1))

HTH

gmaxey
02-14-2011, 06:01 AM
Kieth,

Right. The first code snipet by itself does return the colun index in row 1. What I posted as the proposed answer to my own question was code to add the value of a content control titled say "Client Name" in the last row under the indexed column.

I don't really need to know what that matching text is.

Maybe I have missed your point? Thanks.

Bob Phillips
02-14-2011, 07:22 AM
Who is Keith/Kieth?

Is this what you mean, where colnum is your determined column number?



TargetRow = oSheet.Columns(colnum).End(xlDown).Row + 1

Shred Dude
02-14-2011, 09:14 AM
Greg:

I guess I was trying to answer the part about not having to loop to find the column number that the matching column heading was in. It was late.



One more question. That process told me if the CC Title matched "any" of the Row 1 column heads. I am still using a loop to see which Row 1 column head matches the CC Title:

VBA:
For j = 0 To UBound(arrCCData)
For k = 1 To LastCol
'Find the heading row column text that matches the CC title data
If oSheet.Cells(1, k).Value = arrCCData(j, 0) Then Exit For
Next k
'Put the CC text data in the that column
oSheet.Cells(LastRow + 1, k).Value = arrCCData(j, 1)
Next j
VBA tags courtesy of www.thecodenet.com


Just using the MATCH function gives you that column number. Using Index to return the value was redundant given you obviously already know what the value is.

gmaxey
02-14-2011, 02:08 PM
Well Kieth is a mispelling of my own middle name and goes to show that I don't take the time that I should to proof my posts. It is also the name I call Shred Dude since I believe it is the name his mother or father or someone in authoritiy during his early days on this earth gave to him.

What I have is working. I the value is put in the last row under the column with a matching name. I think your "TargetRow" will do the same thing. Thanks.



Who is Keith/Kieth?

Is this what you mean, where colnum is your determined column number?



TargetRow = oSheet.Columns(colnum).End(xlDown).Row + 1