PDA

View Full Version : Export particular text from a Word doc to an Excel spreadsheet



PapaSquat
12-11-2014, 06:53 AM
I have a large document with headings, paragraphs, tables, figures, etc... and I need to extract all "Rules" and put them in a spreadsheet.

For example, the "Rules" are formatted like this:

Rule 5.3.2.3-7: Text for this particular rule goes here.


Rule 6.1-2: More text for a different rule.

Rule 6.1.3-1: Chicken chicken chicken chicken.



And remember there will likely be tables, figures, and other text between these rules. I'm trying to figure out the best way to get these rules into excel in the following format:
Column 1:
Keyword
Rule
Rule
Rule

Column 2:
Number
1.2-7
1.2.5-2
3.5.8.2-1

Column 3:
Text
Text for this particular rule goes here.
More text for a different rule.
Chicken chicken chicken chicken.



I've worked with VBA some, so I know how to search and find the text segments above (e.g., "Rule", "1.2-7", and "<rule text>") but I can't find or figure out how to put those in a file that I can open in excel. I didn't see anything in the forum that addressed this particular issue but if the solution for this is already covered in another thread I'll be more than happy to read through it. All help is much appreciated! Thanks!

macropod
12-11-2014, 03:17 PM
You could use a macro like the following:

Sub ExportRules()
Application.ScreenUpdating = True
Dim i As Long, StrWkBkNm As String, StrWkSht As String, bFound As Boolean
Dim xlApp As Object, xlWkBk As Object, xlWkSht As Object, bStrt As Boolean
StrWkBkNm = "C:\Users\" & Environ("Username") & "\Documents\Rules.xls"
StrWkSht = "Sheet1": bStrt = False: bFound = False
If Dir(StrWkBkNm) = "" Then
MsgBox "Cannot find the designated workbook: " & StrWkBkNm, vbExclamation
Exit Sub
End If
' Test whether Excel is already running.
On Error Resume Next
Set xlApp = GetObject(, "Excel.Application")
'Start Excel if it isn't running
If xlApp Is Nothing Then
Set xlApp = CreateObject("Excel.Application")
If xlApp Is Nothing Then
MsgBox "Can't start Excel.", vbExclamation
GoTo ErrExit
End If
bStrt = True
End If
On Error GoTo 0
'Check if the workbook is open.
With xlApp
For Each xlWkBk In .Workbooks
If xlWkBk.FullName = StrWkBkNm Then ' It's open
Set xlWkBk = xlWkBk
bFound = True
Exit For
End If
Next
' If not open by the current user.
If bFound = False Then
' Check if another user has it open.
If IsFileLocked(StrWkBkNm) = True Then
' Report and exit if true
MsgBox "The Excel workbook is in use." & vbCr & "Please try again later.", vbExclamation, "File in use"
If bStrt = True Then .Quit
GoTo ErrExit
End If
' The file is available, so open it.
Set xlWkBk = .Workbooks.Open(FileName:=StrWkBkNm)
If xlWkBk Is Nothing Then
MsgBox "Cannot open:" & vbCr & StrWkBkNm, vbExclamation
If bStrt = True Then .Quit
GoTo ErrExit
End If
End If
Set xlWkSht = xlWkBk.Worksheets(StrWkSht)
' Find the last-used row in column A.
' Add 1 to get the next row for data-entry.
i = xlWkSht.Cells(.Rows.Count, 1).End(-4162).Row ' -4162 = xlUp
' Output the rules.
With ActiveDocument
.ConvertNumbersToText
With .Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "Rule[ ]@[0-9.-]@:*^13"
.Execute
End With
'Ask the user whether to change the found text
Do While .Find.Found
i = i + 1
xlWkSht.Cells(i, 1).Value = "Rule"
xlWkSht.Cells(i, 2).Value = Trim(Split(Split(.Text, ":")(0), "Rule")(1))
xlWkSht.Cells(i, 3).Value = Trim(Right(.Text, Len(.Text) - InStr(.Text, ":")))
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
End With
End With
ErrExit:
' Release Excel object memory
Set xlWkSht = Nothing: Set xlWkBk = Nothing: Set xlApp = Nothing
Application.ScreenUpdating = True
End Sub

Function IsFileLocked(strFileName As String) As Boolean
On Error Resume Next
Open strFileName For Binary Access Read Write Lock Read Write As #1
Close #1
IsFileLocked = Err.Number
Err.Clear
End Function
Note that the macro assumes an Excel workbook named 'Rules.xls' exists in your Documents folder and that the output goes to Sheet1. You can change the details in the code.

PapaSquat
12-12-2014, 01:56 PM
Thanks macropod! Your macro worked beautifully! That was far more help than I was expecting and I sure do appreciate it! I spent a good portion of the day going through this line-by-line, making sure I understood each step and looking up a good number of definitions, classes, methods, etc... along the way. I learned a great deal and again thank you for the help!

gmaxey
12-13-2014, 09:08 AM
Hi Paul,

I had some problems with your code as written. I had added a filepicker dialog to select "xls, xlsx or xlsm" type files. When executed the code ran without error, but then when I went to look at the result in the Excel file, Excel reported the file in use by Me. I when to task manager and discovered that an instance of Excel was still running. When I ended the process, I could then open the file, but it was empty. I've added some code toward the end that saves the workbook, closes it if not previously opened, and quits the app if not previously running. This solved the issue I had.

Really don't understand how it worked for you or PapaSquat but it apparently did.


Sub ExportRules()
Dim lngIndex As Long, strExcelFile As String, strSheet As String, bFound As Boolean
Dim xlApp As Object, xlBook As Object, xlSheet As Object, bStart As Boolean
'Mod by GKM
strExcelFile = FileDialogFile
Select Case fcnGetFileExtension(strExcelFile)
Case "xls", "xlsx", "xlsm"
Case Else
MsgBox "You either did not select a file or the file you selected is not a Excel data file", vbExclamation + vbOKOnly, "Invalid File"
End Select
'End mod
Application.ScreenUpdating = True
strSheet = "Sheet1": bStart = False: bFound = False

'Test whether Excel is already running.
On Error Resume Next
Set xlApp = GetObject(, "Excel.Application")
'Start Excel if it isn't running
If xlApp Is Nothing Then
Set xlApp = CreateObject("Excel.Application")
If xlApp Is Nothing Then
MsgBox "Can't start Excel.", vbExclamation
GoTo ErrExit
End If
bStart = True
End If
On Error GoTo 0
'Check if the workbook is open.
With xlApp
For Each xlBook In .Workbooks
If xlBook.FullName = strExcelFile Then ' It's open
Set xlBook = xlBook
bFound = True
Exit For
End If
Next
' If not open by the current user.
If bFound = False Then
' Check if another user has it open.
If IsFileLocked(strExcelFile) = True Then
' Report and exit if true
MsgBox "The Excel workbook is in use." & vbCr & "Please try again later.", vbExclamation, "File in use"
If bStart = True Then .Quit
GoTo ErrExit
End If
' The file is available, so open it.
Set xlBook = .Workbooks.Open(FileName:=strExcelFile)
If xlBook Is Nothing Then
MsgBox "Cannot open:" & vbCr & strExcelFile, vbExclamation
If bStart = True Then .Quit
GoTo ErrExit
End If
End If
Set xlSheet = xlBook.Worksheets(strSheet)
On Error GoTo 0
' Find the last-used row in column A.
' Add 1 to get the next row for data-entry.
lngIndex = xlSheet.Cells(.Rows.Count, 1).End(-4162).Row ' -4162 = xlUp
' Output the rules.
With ActiveDocument
.ConvertNumbersToText
With .Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.Text = "Rule[ ]@[0-9.-]@:*^13"
.Execute
End With
'Ask the user whether to change the found text
Do While .Find.Found
lngIndex = lngIndex + 1
xlSheet.Cells(lngIndex, 1).Value = "Rule"
xlSheet.Cells(lngIndex, 2).Value = Trim(Split(Split(.Text, ":")(0), "Rule")(1))
xlSheet.Cells(lngIndex, 3).Value = Trim(Right(.Text, Len(.Text) - InStr(.Text, ":")))
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
End With
'Mod by GKM
xlBook.Save
If bStart Then
xlBook.Close
xlApp.Quit
End If
'End mods
End With

ErrExit:
' Release Excel object memory
Set xlSheet = Nothing: Set xlBook = Nothing: Set xlApp = Nothing
Application.ScreenUpdating = True
End Sub
Function IsFileLocked(strFileName As String) As Boolean
On Error Resume Next
Open strFileName For Binary Access Read Write Lock Read Write As #1
Close #1
IsFileLocked = Err.Number
Err.Clear
End Function
'Note that the macro assumes an Excel workbook named 'Rules.xls' exists in your Documents folder and that the output goes to Sheet1. You can change the details in the code.
Function FileDialogFile(Optional strCaption As String = "") As String
With Application.FileDialog(msoFileDialogFilePicker)
.Title = strCaption
If .Show Then
FileDialogFile = .SelectedItems(1)
Else
FileDialogFile = ""
End If
End With
End Function
Function fcnGetFileExtension(ByRef strFileName As String) As String
'Gets the extention assigned to a file including the delimiter "."
On Error GoTo Err_NoExtension
fcnGetFileExtension = VBA.Right(strFileName, Len(strFileName) - (InStrRev(strFileName, ".", -1, vbTextCompare)))
If fcnGetFileExtension = strFileName Then
'There is no extension
fcnGetFileExtension = ""
End If
lbl_Exit:
Exit Function
Err_NoExtension:
Resume lbl_Exit
End Function

PapaSquat
12-15-2014, 06:40 AM
Hi gmaxey,

I see what you mean. The first time I ran the macro I had already created a blank excel workbook with the right name and in the right location (and by "right" I mean "as defined in the macro") and I had the spreadsheet open next to my word document. This way I could see the cells being populated by the macro as it ran. That said, I just ran it again WITHOUT opening the spreadsheet first and ran into the exact problem you described. Therefore I'm about to incorporate your additions into my code as well. Thanks so much for the reply!

gmaxey
12-15-2014, 06:55 AM
PapaSquat, You're welcome.

macropod
12-15-2014, 06:19 PM
With the code I posted, I believe the issues Greg had can be fixed by adding:
xlWkBk.Save
xlApp.Visible = True
before:
ErrExit:

PapaSquat
12-16-2014, 06:16 AM
macropod and gmaxey, I have another question for you.

How can I move the text to Excel while also keeping the same text formatting from Word.

The word document I'm pulling the text from is actually a draft version. In my case this means all text that's bold and with strikethroughs represent words to be deleted, while bold with underlines represents words to be added. Text that's only bolded (no strikethrough or underline) needs to remain bolded.

For example, a good portion of the rules look like this.

Rule 3.4.2-7: The product shall be green blue.


Using the above macro, it imports into excel like this.



Rule
3.4.2-7
The product shall be green blue.





I'm looking for a final result like this.



Rule
3.4.2-7
The product shall be green blue.





Instead of using 'Selection.Text', I tried using 'Selection.FormattedText', 'Selection.WordOpenXML', and 'Selection.XML'. 'Selection.FormattedText' provided the same result as 'Selection.Text' while the other two commands returned messy XML code (which makes sense but obviously not helpful for me).

I searched the forum before posing this question, so if this is already covered in another thread and I just missed it, please provide a link and I'll be happy to check it out. Thanks!!

gmaxey
12-16-2014, 07:02 AM
PapaSquat,

Paul is much more familiar with Excel than me and will likely provide a better process. However, I think you will need to copy and paste:


Do While .Find.Found
lngIndex = lngIndex + 1
xlSheet.Cells(lngIndex, 1).Value = "Rule"
xlSheet.Cells(lngIndex, 2).Value = Trim(Split(Split(.Text, ":")(0), "Rule")(1))
'Start Mods GKM
.MoveEndUntil ":", 1
.Copy
xlSheet.Cells(lngIndex, 3).Select
xlSheet.PasteSpecial Format:="HTML", Link:=False, DisplayAsIcon:=False
'xlSheet.Cells(lngIndex, 3).Value = Trim(Right(.Text, Len(.Text) - InStr(.Text, ":")))
'End mods
.Collapse wdCollapseEnd
.Find.Execute
Loop