Log in

View Full Version : [SLEEPER:] Extracting text from Word to Excel between "start" and "end" words



vt_pat
09-28-2010, 06:55 AM
Hi,

I'm a novice VBA user in need of some help. I'm trying to extract sections of text (paragraphs) bounded by a "start" word and "end" word from a 150+ page word (Office 2003) document to an excel document. The document has a cover sheet, table of contents the with the paragraphs in question in the middle of this document. The paragraphs are always bounded by the same "start" word string that's enumerated and the same "end" word.

START_1001
Blah
Blah
Blah
END_OF_PARAGRAPH

START_1002
Blah
Blah
Blah
END_OF_PARAGRAPH


I've attempted a section of code that somewhat works. The extract to excel part is troublesome as I can't seem to be able to index the row count properly not to have overlapping pasting of the paragraphs.

Also I'm having trouble keeping the loop going from paragraph to paragraph to the end of the word document before closing the excel doc. How do you know you're at the end of the document?

When the paragraph contains pictures in the word doc, when copied and pasted in excel the pictures are pasted on top of one another and on top of the text. Can this be corrected?

Thanks in advance for the help.



Option Explicit

Sub CopyRequirementsBetweenWords()
Dim appExcel As Object
Dim objSheet As Object
Dim aRange As Range
Dim Endofline As Object
Dim intRowCount As Integer
Dim LineCount As Integer
intRowCount = 1
Do Until ActiveDocument.Bookmarks("\Sel").Range.End = ActiveDocument.Bookmarks("\EndOfDoc").Range.End ' This is the "End of document" tag
With Selection.Find
.Text = "START_" 'This is the "Start" word for the search
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute
Selection.MoveLeft Unit:=wdCharacter, Count:=1
Do
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
' This is the "End" word for the search
If InStr(1, Selection.Text, "END_") Then Exit Do
Selection.MoveDown Unit:=wdLine, Extend:=wdExtend
LineCount = LineCount + 1
Loop
Selection.Copy
If objSheet Is Nothing Then
Set appExcel = CreateObject("Excel.Application")
' Change the file path to match the location of your test.xls
Set objSheet = appExcel.workbooks.Open("C:\temp\test1.xls").Sheets("Sheet1")
intRowCount = 1
End If
objSheet.Cells(intRowCount, 1).Select
objSheet.Paste
intRowCount = intRowCount + LineCount
Selection.MoveDown Unit:=wdLine, Count:=LineCount
Loop ' Loop to next requirement for copy and pasting into Excel document
If Not objSheet Is Nothing Then
appExcel.workbooks(1).Close True
appExcel.Quit
Set objSheet = Nothing
Set appExcel = Nothing
End If
Set aRange = Nothing
End Sub

Tinbendr
09-28-2010, 10:57 AM
Welcome to VBA Express!

Try this.

Add
Dim LastRow As Long

Then


LastRow = objSheet.Range("A65536").End(xlUp).Row
objSheet.Cells(LastRow + 1, 1).Select
objSheet.Paste
'intRowCount = intRowCount + LineCount
'Selection.MoveDown Unit:=wdLine, Count:=LineCount

David

vt_pat
09-28-2010, 01:51 PM
Hi David,

I've actually worked out some fixes to my code issues. The loop works to continue to go through to the next paragraph, and I was able to delete the superimposed pictures.

The issue now is that if there is a table in the paragraph, the code enters the paragraph but gets stuck in an infinite loop right before entering the table to start copying - not sure how to fix this... Any thoughts?

Thanks!
Pat



Option Explicit

Sub CopyRequirementsBetweenWords()
' This macro copies requirements from a source document using a "Start" and "End" key word that defines a range
' and exports this selection to an Excel file.
Dim appExcel As Object
Dim objSheet As Object
Dim aRange As Range
Dim Endofline As Object
Dim intRowCount As Integer
Dim LineCount As Integer
Dim StartText As String
Dim EndText As String
Dim OutputFileName As String
intRowCount = 1
' Define search parameters
StartText = InputBox("Please enter your Start word") 'This is the "Start" word for the search
EndText = InputBox("Please enter your End Word") 'This is the "End" word for the search
' Define output parameters
OutputFileName = InputBox("Please enter your Output File path, file name and extension (Ex: C:\Temp\Test.xls)")
Do Until ActiveDocument.Bookmarks("\Sel").Range.End = ActiveDocument.Bookmarks("\EndOfDoc").Range.End ' This is the "End of document" tag
With Selection.Find
.Text = StartText 'This is the "Start" word for the search from the Inputbox
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
LineCount = 1
End With
Selection.Find.Execute
Selection.MoveLeft Unit:=wdCharacter, Count:=1
Do
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
' This is the "End" word for the search from the Inputbox
If InStr(1, Selection.Text, EndText) Then Exit Do
' This section does not work with Word tables...cursor does not enter the table to start copying
Selection.MoveDown Unit:=wdLine, Extend:=wdExtend
LineCount = LineCount + 1
Loop
Selection.Copy
LineCount = LineCount + 3
If objSheet Is Nothing Then
Set appExcel = CreateObject("Excel.Application")
' Change the file path to match the location of your test.xls
Set objSheet = appExcel.workbooks.Open(OutputFileName).Sheets("Sheet1")
intRowCount = 1
End If
objSheet.Cells(intRowCount, 1).Select
objSheet.Paste
intRowCount = intRowCount + LineCount
Selection.MoveDown Unit:=wdLine, Count:=1
' Delete the newly pasted pictures in Excel
On Error Resume Next
objSheet.DrawingObjects.Visible = True ' "objSheet" enables stuff to happen in Excel
objSheet.DrawingObjects.Delete
On Error GoTo 0
objSheet.Cells.ClearComments
Loop ' Loop to next requirement for copy and pasting into Excel document
If Not objSheet Is Nothing Then
appExcel.workbooks(1).Close True
appExcel.Quit
Set objSheet = Nothing
Set appExcel = Nothing
End If
Set aRange = Nothing
End Sub

fumei
09-28-2010, 02:11 PM
Let me see if I have this correctly.

You search for the starting word (StartText), and then, if found, move one character to the left.


Selection.MoveLeft Unit:=wdCharacter, Count:=1

And then


Do
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
' This is the "End" word for the search from the Inputbox
If InStr(1, Selection.Text, EndText) Then Exit Do
' This section does not work with Word tables...
' cursor does not enter the table to start copying
Selection.MoveDown Unit:=wdLine, Extend:=wdExtend
LineCount = LineCount + 1
Loop


1. extend the selection to the line
2. check to see if current Selection contains the ending word (EndText) and if not move down a line and repeat #2.

YIKES!

Is this correct? Is this what you are doing?

fumei
09-28-2010, 02:14 PM
Question. Say Endtext is a word in a table. What do you want to happen? The whole table is used? Part of it up to EndText?

Using Selection like this is very very inefficient.

vt_pat
09-29-2010, 06:39 AM
Hi Fumei,

I'm sure that the bulk of my code is far from efficient as I've only been using VBA for a few days. The logic behind the "start" word find was to look for it, identify it then go the beginning of the word then highlight the word and subsequent text up to and including the "end" word.

Once I find the word and it's highlighted I move left one character, it actually moves the cursor to the beginning of the word.


Selection.MoveLeft Unit:=wdCharacter, Count:=1

I tried extending the selection to the line, but my implementation must be wrong as it highlights the entire document...

Normally the "end text" would not be found in the table as it's a single line, but if it was I'd like to copy the entire table for pasting into Excel.

-Pat

Tinbendr
09-30-2010, 05:15 AM
Can you upload two sample files to help resolve all the issues? One with the Word doc and one for Excel. In the Excel file, show the expected output.

Zip and upload using Manage Attachments in Additional Options.

Just a comment on your code so far.

Learn to use ranges instead of Selection. The macro recorder uses Selection exclusively, but it's very inefficient and messy.

angadi
11-01-2011, 06:40 PM
I am new to this forum, and have found the answer to the above mentioned issue. Please let me know if i can post it

John Doe
01-11-2018, 12:08 PM
Although this thread is quite old, your approach would be very much appreciated!

Dave
01-11-2018, 03:00 PM
Hi John Doe and Welcome to this forum! If U check the Knowledge Base of this form for "Extract and Transfer Paragraphs" might be helpful. I was the contributor. Good luck. Dave