View Full Version : loop through tables and extract from document
jon_w
06-29-2011, 05:17 AM
Hi,
 I have lots of documents that I need to look through, and I need to find out if there are any 'HOLDS' (withheld information) in them.
In each document, there are several tables, one of which is the list of holds to do with that document. However, the format of each document is not the same and therefore the holds table can be in different places depending on the document.
   
  I would like to automate the search process so the VBA code searches the tables in the document, and then once it finds ‘hold number’ in the column heading, it extracts that table to an access table for referencing. I have extremely limited VBA knowledge, as I have only worked with access SQL script writing, so comments through the script saying what is going on would be greatly appreciated.
  Is this possible?
   
  Thanks in advance
  Jon
macropod
07-01-2011, 08:12 PM
Hi jon,
 
You need to give more detail. Remember: none of us has any idea about the structure of your document or database other than what you tell us.
 
How is the code to identify whether a table has 'HOLDS' (withheld information) in them? Is there a particular row/column/cell that can be tested and, if so, what would be found there?
jon_w
07-04-2011, 02:56 AM
yeah, sorry about that. The document has many different tables throughout it. The table I am looking for has the word "Hold" in the first row and first column (top left of the table). All the others have various other bits of info and data that I don't need for this task.
I would like the function to loop through the tables until it finds the word "HOLD" in the top left cell. I've got the copy and modifaction part of the function now.
I guess all I need is an If function. eg 'If "HOLD" in cell, then' followed by my copy and modification script that I've written.
Is this a bit clearer?
macropod
07-04-2011, 06:44 PM
Hi jon,
 
Try the following. With this code, you merely select the folder to process. A new folder will be created below that, containing all the documents of interest, reduced to just the tables of interest. I'll leave it to you to get the data from there into Access.
Sub ParseDocs()
Application.ScreenUpdating = False
Dim strInFold As String, strOutFold As String, strFile As String, strOutFile As String
Dim DocSrc As Document, TOC As TableOfContents, DocTbl As Document, Tbl As Table, Rng As Range
'Call the GetFolder Function to determine the folder to process
strInFold = GetFolder
If strInFold = "" Then Exit Sub
strFile = Dir(strInFold & "\*.doc", vbNormal)
'Check for documents in the folder - exit if none found
If strFile <> "" Then strOutFold = strInFold & "\Output\"
'Test for an existing outpfolder & create one if it doesn't already exist
If Dir(strOutFold, vbDirectory) = "" Then MkDir strOutFold
strFile = Dir(strInFold & "\*.doc", vbNormal)
'Process all documents in the chosen folder
While strFile <> ""
  Set DocSrc = Documents.Open(FileName:=strInFold & "\" & strFile, AddTorecentFiles:=False, Visible:=False)
  With DocSrc
    'Delete any Tables Of Contents in the source document
    For Each TOC In .TablesOfContents
      TOC.Delete
    Next
    'Convert all fields in the source document to plain text
    On Error Resume Next
    .Fields.Unlink
    'Delete manual page breaks
    With .Content.Find
      .ClearFormatting
      .Replacement.ClearFormatting
      .Text = "^m"
      .Replacement.Text = ""
      .Execute Replace:=wdReplaceAll
    End With
    For Each Tbl In .Tables
      Set Rng = Tbl.Cell(1, 1).Range
      Rng.End = Rng.End - 1
      If UCase(Trim(Rng.Words.First)) <> "HOLD" Then Tbl.Delete
    Next
    'Check for any remaining tables in the source document
    If .Tables.Count > 0 Then
      'If there are any tables in the source document, make a copy of the document
      .Range.Copy
      ' Create a new document for the tables
      Set DocTbl = Documents.Add(Visible:=False)
      'Process the new document
      Call MakeTableDoc(DocTbl)
    End If
    'String variable for the output filenames
    strOutFile = strOutFold & Split(.Name, ".")(0)
    'Save and close the tables document
    If Not DocTbl Is Nothing Then
      DocTbl.SaveAs FileName:=strOutFile & "-Tables", AddTorecentFiles:=False
      DocTbl.Close
      Set DocTbl = Nothing
    End If
    'Close the source document without saving any changes to it
    .Close SaveChanges:=False
  End With
  strFile = Dir()
Wend
Set Rng = Nothing: Set DocSrc = Nothing
Application.ScreenUpdating = True
End Sub
 
Function GetFolder(Optional Title As String, Optional RootFolder As Variant) As String
On Error Resume Next
GetFolder = CreateObject("Shell.Application").BrowseForFolder(0, Title, 0, RootFolder).Items.Item.Path
End Function
 
Sub MakeTableDoc(DocTbl As Document)
Dim Sctn As Section, Para As Paragraph, Rng As Range
With DocTbl
  .Range.Paste
  'Delete any Sections with no tables in the tables document
  For Each Sctn In .Sections
    With Sctn
      If .Range.Tables.Count = 0 Then
        .Range.Delete
      Else
        On Error Resume Next
        'Delete any Section breaks with no effect on page orientation
        If .PageSetup.Orientation = Sctn.Range.Previous.PageSetup.Orientation Then
          .Range.Previous.Characters.Last.Delete
        End If
      End If
    End With
  Next
  'Check all paragraphs not in tables in the tables document
  For Each Para In .Paragraphs
    With Para
      Set Rng = .Range
      On Error Resume Next
      With Rng
        If .Information(wdWithInTable) = False Then
          If .Next.Paragraphs.First.Range.Information(wdWithInTable) = False Then
            'Delete any paragraphs not followed by a table in the tables document
            .Delete
          Else
            .End = .End - 1
            .Text = vbNullString
          End If
        End If
      End With
    End With
  Next
End With
Set Rng = Nothing
End Sub
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.