PDA

View Full Version : [SOLVED] Import free text from multiple word documents to excel columns



ammarhm
07-06-2017, 02:12 AM
Hi everyone
I am trying to adopt a code i found in this form to import the contents of multiple word files into an excel sheet, here is the code:


Sub ExtractWordInfo() Dim fd As FileDialog
Set fd = Application.FileDialog(msoFileDialogFolderPicker)
Dim strFolder As String
Dim strFile As String
Dim wdApp As New Word.Application
Dim wdDoc As Word.Document
Dim CCtrl As Word.ContentControl
Dim WkSht As Worksheet, i As Long, j As Long
Set WkSht = ActiveSheet
i = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
With fd
fd.Title = "Select the folder that contains the files that you want to extract date from."
If fd.Show = -1 Then
strFolder = .SelectedItems(1) & "\"
Else
MsgBox "You did not select a folder."
Exit Sub
End If
End With
strFile = Dir$(strFolder & "*.doc*")
Do While Len(strFile) > 0
MsgBox strFile
i = i + 1
Set wdDoc = Documents.Open(strFolder & strFile, AddToRecentFiles:=False, Visible:=False)
With wdDoc
j = 0
For Each CCtrl In .ContentControls
j = j + 1
WkSht.Cells(i, j) = CCtrl.Range.Text
Next
End With
strFile = Dir
Loop
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
wdApp.Quit
End Sub





This is inserted into an excel file to import contents from multiple word documents. As you notice the macro specifically imports contentcontrols.
My question is the following:
Can i change this code to allow me to import free text from word documents and split the contents into 3 columns based on 3 search words: "introduction", "Methods", "Results".
To complicate things further, each documents has the text stored in a table with rows, with one row containing the word "introduction" followed by a row with the free text for introduction, then a row with the word "methods" followed by a row with the free text of methods and so forth
I really appreciate any help with this. I am happy to upload a samples word document if necessary
Kind regards

mana
07-06-2017, 02:31 AM
Please post a sample word file.

mana
07-06-2017, 04:26 AM
Set cel = wdDoc.Tables(1).Cell(2, 1)
Do Until cel Is Nothing
Select Case LCase(cel.Previous.Range.Words(1).Text)
Case "introduction"
WkSht.Cells(i, 1) = Replace(cel.Range.Text, vbCr & Chr(7), "")
Set cel = cel.Next
Case "methods"
WkSht.Cells(i, 2) = Replace(cel.Range.Text, vbCr & Chr(7), "")
Set cel = cel.Next
Case "results"
WkSht.Cells(i, 3) = Replace(cel.Range.Text, vbCr & Chr(7), "")
Exit Do
End Select
Set cel = cel.Next
Loop
wdDoc.Close 0

ammarhm
07-06-2017, 01:37 PM
Thank you for the kind answer
I have now used the following code successfully to import the text from the word file table


Option Explicit

Sub ImportWordTable()


Dim wdDoc As Object
Dim wdFileName As Variant
Dim tableNo As Integer 'table number in Word
Dim iRow As Long 'row index in Excel
Dim iCol As Integer 'column index in Excel
Dim resultRow As Long
Dim tableStart As Integer
Dim tableTot As Integer


On Error Resume Next


ActiveSheet.Range("A:AZ").ClearContents


wdFileName = Application.GetOpenFilename("Word files (*.doc),*.doc", , _
"Browse for file containing table to be imported")


If wdFileName = False Then Exit Sub '(user cancelled import file browser)


Set wdDoc = GetObject(wdFileName) 'open Word file


With wdDoc
tableNo = wdDoc.tables.Count
tableTot = wdDoc.tables.Count
If tableNo = 0 Then
MsgBox "This document contains no tables", _
vbExclamation, "Import Word Table"
ElseIf tableNo > 1 Then
tableNo = InputBox("This Word document contains " & tableNo & " tables." & vbCrLf & _
"Enter the table to start from", "Import Word Table", "1")
End If


resultRow = 4


For tableStart = 1 To tableTot
With .tables(tableStart)
'copy cell contents from Word table cells to Excel cells
For iRow = 1 To .Rows.Count
For iCol = 1 To .Columns.Count
Cells(iCol, resultRow) = WorksheetFunction.Clean(.cell(iRow, iCol).Range.Text)
Next iCol
resultRow = resultRow + 1
Next iRow
End With
resultRow = resultRow + 1
Next tableStart
End With


End Sub





The only problem is that I have the word files in several subfolders, and I will need the code to cycle through the word documents in the subfolders to extract the data, could anyone help on how to do that?
Thank you

mdmackillop
07-06-2017, 02:35 PM
Sub Test()
'http://www.snb-vba.eu/VBA_Fill_combobox_listbox_en.html#L_2.1.4

pth = """C:\VBAX\*.doc*"""

arr = Filter(Split(CreateObject("wscript.shell").exec("cmd /c Dir " & pth & " /b /a-d /s").stdout.readall, vbCrLf), ".")
For i = LBound(arr) To UBound(arr)
Call ImportWordTable(arr(i))
Next i
End Sub

ammarhm
07-07-2017, 12:00 AM
Sub Test()


pth = """C:\VBAX\*.doc*"""

arr = Filter(Split(CreateObject("wscript.shell").exec("cmd /c Dir " & pth & " /b /a-d /s").stdout.readall, vbCrLf), ".")
For i = LBound(arr) To UBound(arr)
Call ImportWordTable(arr(i))
Next i
End Sub


Thank you mdmackillop
I am almost there, the code is doing what I would want it to do, but it is rather slow. Is there a way of optimizing this further? sometimes hiding the popup or the application would make things runfaster. I would appreciate any suggestions

Option Explicit

Sub ImportWordTable(wdFileName As Variant, j As Integer)


Dim wdDoc As Object
'Dim wdFileName As Variant
Dim tableNo As Integer 'table number in Word
Dim iRow As Long 'row index in Excel
Dim iCol As Integer 'column index in Excel
Dim x As Integer
Dim resultRow As Long
Dim tableStart As Integer
Dim tableTot As Integer


On Error Resume Next


' wdFileName = Application.GetOpenFilename("Word files (*.doc),*.doc", , _
"Browse for file containing table to be imported")


If wdFileName = False Then Exit Sub '(user cancelled import file browser)


Set wdDoc = GetObject(wdFileName) 'open Word file


With wdDoc
tableNo = wdDoc.tables.Count
tableTot = wdDoc.tables.Count
If tableNo = 0 Then
MsgBox "This document contains no tables", _
vbExclamation, "Import Word Table"
ElseIf tableNo > 1 Then
tableNo = InputBox("This Word document contains " & tableNo & " tables." & vbCrLf & _
"Enter the table to start from", "Import Word Table", "1")
End If


resultRow = 1


For tableStart = 1 To tableTot

With .tables(tableStart)
'copy cell contents from Word table cells to Excel cells
For iRow = 1 To .Rows.Count
For iCol = 1 To .Columns.Count
x = iCol + j
Cells(x, resultRow) = WorksheetFunction.Clean(.cell(iRow, iCol).Range.Text)
Next iCol
resultRow = resultRow + 1
Next iRow
End With
resultRow = resultRow + 1
Next tableStart
End With


End Sub



Sub Test()

Dim pth As Variant
Dim arr As Variant
Dim i As Integer
ActiveSheet.Range("A:AZ").ClearContents
pth = """C:\Users\Home\Desktop\data\*.doc*"""

arr = Filter(Split(CreateObject("wscript.shell").exec("cmd /c Dir " & pth & " /b /a-d /s").stdout.readall, vbCrLf), ".")
For i = LBound(arr) To UBound(arr)
Call ImportWordTable(arr(i), i)
Next i
End Sub




Kind regards

mdmackillop
07-07-2017, 12:56 AM
Please post a sample Word file (as requested previously)

ammarhm
07-07-2017, 01:42 AM
Apologies, attached is a sample word document
It would of course be a great advantage if i could import all the fileds from the documents in seperate columns

Thanks again

ammarhm
07-07-2017, 01:43 AM
link

ammarhm
07-07-2017, 01:43 AM
I couldn't attach the word document so i am sending a shared link, I hope you can download it
https://1drv.ms/w/s!AuL_dYmx22kl9RD61BfWGIgUur8P

mdmackillop
07-07-2017, 05:05 AM
I've simplified the code (maybe too much!)
I'm sometimes getting an error on the documents open line. The Pause seems to "fix" it, but you can try omitting it.

Option Explicit


Dim j As Long
Dim Wrd As Object


Sub Test()
Dim pth As Variant
Dim Arr As Variant
Dim i As Integer
ActiveSheet.Range("A:AZ").ClearContents
pth = """C:\VBAX\*17.doc*"""

Cells.ClearContents

Arr = Filter(Split(CreateObject("wscript.shell").exec("cmd /c Dir " & pth & " /b /a-d /s").stdout.readall, vbCrLf), ".")

On Error Resume Next
Set Wrd = GetObject(, "Word.Application")
If Wrd Is Nothing Then Set Wrd = CreateObject("Word.Application")

Do
DoEvents
Loop Until Not Wrd Is Nothing

On Error GoTo 0
For i = LBound(Arr) To UBound(Arr)
Call ImportWordTable(Arr(i))
Next i
Wrd.Quit
End Sub


Sub ImportWordTable(wdFileName As Variant)
Dim wdDoc As Object
Dim tableNo As Integer 'table number in Word
Dim iRow As Long 'row index in Excel
Dim iCol As Integer 'column index in Excel
Dim x As Integer
Dim resultRow As Long
Dim tableStart As Integer
Dim tableTot As Integer
Dim Arr, i

MsgBox "Pause"
Set wdDoc = Wrd.Documents.Open(wdFileName) 'Sometimes error here
'Wrd.Visible = True
With wdDoc
With .tables(1)
i = .Rows.Count * .Columns.Count
x = 1
ReDim Arr(1 To i)
For iRow = 1 To .Rows.Count
For iCol = 1 To .Columns.Count
On Error Resume Next
Arr(x) = WorksheetFunction.Clean(.cell(iRow, iCol).Range.Text)
On Error GoTo 0
x = x + 1
Next iCol
Next iRow
End With
End With
wdDoc.Close
Set wdDoc = Nothing
j = j + 1
Cells(j, 1).Resize(, i).Value = Arr
End Sub

ammarhm
07-09-2017, 09:26 AM
Thank you very much mdmackillop
Unfortunately the code is generating error messages and not extracting data,
Runtime error 5941
The requested member of the collection dose not exist

Thi si occurring on the With .tables(1) step
I am not sure which if any reference i am missing

mdmackillop
07-09-2017, 10:12 AM
Seems you need to Set Wrd = Nothing.
This works for me now.

Option Explicit

Dim j As Long
Dim Wrd As Object

Sub Test()
Dim pth As Variant
Dim Arr As Variant
Dim i As Integer
ActiveSheet.Range("A:AZ").ClearContents
pth = """C:\VBAX\*17.doc*"""

Cells.ClearContents
j = 0
Arr = Filter(Split(CreateObject("wscript.shell").exec("cmd /c Dir " & pth & " /b /a-d /s").stdout.readall, vbCrLf), ".")

On Error Resume Next
Set Wrd = GetObject(, "Word.Application")
If Wrd Is Nothing Then Set Wrd = CreateObject("Word.Application")

Do
DoEvents
Loop Until Not Wrd Is Nothing

On Error GoTo 0
For i = LBound(Arr) To UBound(Arr)
Call ImportWordTable(Arr(i))
Next i
Wrd.Quit
Set Wrd = Nothing
End Sub


Sub ImportWordTable(wdFileName As Variant)
Dim wdDoc As Object
Dim tableNo As Integer 'table number in Word
Dim iRow As Long 'row index in Excel
Dim iCol As Integer 'column index in Excel
Dim x As Integer
Dim resultRow As Long
Dim tableStart As Integer
Dim tableTot As Integer
Dim Arr, i

Set wdDoc = Wrd.Documents.Open(CStr(wdFileName)) 'Sometimes error here
With wdDoc
With .tables(1)
i = .Rows.Count * .Columns.Count
x = 1
ReDim Arr(1 To i)
For iRow = 1 To .Rows.Count
For iCol = 1 To .Columns.Count
On Error Resume Next
Arr(x) = WorksheetFunction.Clean(.cell(iRow, iCol).Range.Text)
On Error GoTo 0
x = x + 1
Next iCol
Next iRow
End With
End With
wdDoc.Close
Set wdDoc = Nothing
j = j + 1
Cells(j, 1).Resize(, i).Value = Arr
End Sub

ammarhm
07-09-2017, 10:54 AM
Thank you for the quick answer
unfortunately the same error is persisting:

Runtime error 5941
The requested member of the collection dose not exist


Any clues?
Thank you very much for your time

ammarhm
07-09-2017, 10:57 AM
Hi again
Form what iI gather this error could sometimes be caused by a missing reference
I have the following references enabled in the module:
Visual Basic for Application
Microsoft Excel 16 Object library
OLE automation
Microsoft office 16 Object library
Microsoft Word 16 object library

These were enough to run my initial code. Any other reference that you think should be added?
Kind regards

ammarhm
07-09-2017, 11:31 AM
Interestingly, upon further analysis, the wdDoc.tables.Count is returning 0 despite the fact that the document contains a table
The file path looks alright
Not sure why this is happening

ammarhm
07-09-2017, 12:13 PM
Ok, this is solved now
The issue has been some temporary/junk word files that were created int he directory and caused the code to crash
I added a check function to make sure the document contained tables and also an "on error resume next" and the code is working well now
Thanks for your help

mdmackillop
07-09-2017, 12:36 PM
Glad you got it sorted.