Excel Hints

Results 1 to 8 of 8

Thread: Extracting text from Word to Excel between "start" and "end" words

  1. #1

    Extracting text from Word to Excel between "start" and "end" words

    Hi,

    I'm a novice VBA user in need of some help. I'm trying to extract sections of text (paragraphs) bounded by a "start" word and "end" word from a 150+ page word (Office 2003) document to an excel document. The document has a cover sheet, table of contents the with the paragraphs in question in the middle of this document. The paragraphs are always bounded by the same "start" word string that's enumerated and the same "end" word.

    START_1001
    Blah
    Blah
    Blah
    END_OF_PARAGRAPH

    START_1002
    Blah
    Blah
    Blah
    END_OF_PARAGRAPH


    I've attempted a section of code that somewhat works. The extract to excel part is troublesome as I can't seem to be able to index the row count properly not to have overlapping pasting of the paragraphs.

    Also I'm having trouble keeping the loop going from paragraph to paragraph to the end of the word document before closing the excel doc. How do you know you're at the end of the document?

    When the paragraph contains pictures in the word doc, when copied and pasted in excel the pictures are pasted on top of one another and on top of the text. Can this be corrected?

    Thanks in advance for the help.

    [VBA]
    Option Explicit

    Sub CopyRequirementsBetweenWords()

    Dim appExcel As Object
    Dim objSheet As Object
    Dim aRange As Range
    Dim Endofline As Object
    Dim intRowCount As Integer
    Dim LineCount As Integer

    intRowCount = 1

    Do Until ActiveDocument.Bookmarks("\Sel").Range.End = _
    ActiveDocument.Bookmarks("\EndOfDoc").Range.End 'This is the "End of document" tag

    With Selection.Find
    .Text = "START_" 'This is the "Start" word for the search
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
    End With

    Selection.Find.Execute
    Selection.MoveLeft Unit:=wdCharacter, Count:=1

    Do
    Selection.EndKey Unit:=wdLine, Extend:=wdExtend

    'This is the "End" word for the search
    If InStr(1, Selection.Text, "END_") Then Exit Do

    Selection.MoveDown Unit:=wdLine, Extend:=wdExtend
    LineCount = LineCount + 1
    Loop

    Selection.Copy

    If objSheet Is Nothing Then
    Set appExcel = CreateObject("Excel.Application")

    'Change the file path to match the location of your test.xls
    Set objSheet = appExcel.workbooks.Open("C:\temp\test1.xls").Sheets("Sheet1")

    intRowCount = 1
    End If


    objSheet.Cells(intRowCount, 1).Select
    objSheet.Paste
    intRowCount = intRowCount + LineCount

    Selection.MoveDown Unit:=wdLine, Count:=LineCount

    Loop 'Loop to next requirement for copy and pasting into Excel document

    If Not objSheet Is Nothing Then
    appExcel.workbooks(1).Close True
    appExcel.Quit
    Set objSheet = Nothing
    Set appExcel = Nothing
    End If
    Set aRange = Nothing

    End Sub


    [/VBA]

  2. #2
    VBAX Expert Tinbendr's Avatar
    Joined
    Jun 2005
    Location
    North Central Mississippi (The Pines)
    Posts
    991
    Location
    Welcome to VBA Express!

    Try this.

    Add[vba]Dim LastRow As Long[/vba]

    Then
    [vba]
    LastRow = objSheet.Range("A65536").End(xlUp).Row
    objSheet.Cells(LastRow + 1, 1).Select
    objSheet.Paste
    'intRowCount = intRowCount + LineCount
    'Selection.MoveDown Unit:=wdLine, Count:=LineCount
    [/vba]

    David

  3. #3
    Hi David,

    I've actually worked out some fixes to my code issues. The loop works to continue to go through to the next paragraph, and I was able to delete the superimposed pictures.

    The issue now is that if there is a table in the paragraph, the code enters the paragraph but gets stuck in an infinite loop right before entering the table to start copying - not sure how to fix this... Any thoughts?

    Thanks!
    Pat

    [VBA]
    Option Explicit

    Sub CopyRequirementsBetweenWords()

    ' This macro copies requirements from a source document using a "Start" and "End" key word that defines a range
    ' and exports this selection to an Excel file.


    Dim appExcel As Object
    Dim objSheet As Object
    Dim aRange As Range
    Dim Endofline As Object
    Dim intRowCount As Integer
    Dim LineCount As Integer
    Dim StartText As String
    Dim EndText As String
    Dim OutputFileName As String

    intRowCount = 1

    'Define search parameters
    StartText = InputBox("Please enter your Start word") 'This is the "Start" word for the search
    EndText = InputBox("Please enter your End Word") 'This is the "End" word for the search

    'Define output parameters
    OutputFileName = InputBox("Please enter your Output File path, file name and extension (Ex: C:\Temp\Test.xls)")

    Do Until ActiveDocument.Bookmarks("\Sel").Range.End = _
    ActiveDocument.Bookmarks("\EndOfDoc").Range.End 'This is the "End of document" tag

    With Selection.Find
    .Text = StartText 'This is the "Start" word for the search from the Inputbox
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
    LineCount = 1
    End With

    Selection.Find.Execute
    Selection.MoveLeft Unit:=wdCharacter, Count:=1

    Do
    Selection.EndKey Unit:=wdLine, Extend:=wdExtend

    'This is the "End" word for the search from the Inputbox
    If InStr(1, Selection.Text, EndText) Then Exit Do
    'This section does not work with Word tables...cursor does not enter the table to start copying
    Selection.MoveDown Unit:=wdLine, Extend:=wdExtend
    LineCount = LineCount + 1
    Loop

    Selection.Copy

    LineCount = LineCount + 3

    If objSheet Is Nothing Then
    Set appExcel = CreateObject("Excel.Application")

    'Change the file path to match the location of your test.xls
    Set objSheet = appExcel.workbooks.Open(OutputFileName).Sheets("Sheet1")

    intRowCount = 1
    End If

    objSheet.Cells(intRowCount, 1).Select
    objSheet.Paste
    intRowCount = intRowCount + LineCount

    Selection.MoveDown Unit:=wdLine, Count:=1

    'Delete the newly pasted pictures in Excel

    On Error Resume Next
    objSheet.DrawingObjects.Visible = True '"objSheet" enables stuff to happen in Excel
    objSheet.DrawingObjects.Delete
    On Error GoTo 0

    objSheet.Cells.ClearComments

    Loop 'Loop to next requirement for copy and pasting into Excel document

    If Not objSheet Is Nothing Then
    appExcel.workbooks(1).Close True
    appExcel.Quit
    Set objSheet = Nothing
    Set appExcel = Nothing
    End If
    Set aRange = Nothing

    End Sub
    [/VBA]

  4. #4
    Let me see if I have this correctly.

    You search for the starting word (StartText), and then, if found, move one character to the left.[vba]
    Selection.MoveLeft Unit:=wdCharacter, Count:=1
    [/vba]And then
    [vba]
    Do
    Selection.EndKey Unit:=wdLine, Extend:=wdExtend

    'This is the "End" word for the search from the Inputbox
    If InStr(1, Selection.Text, EndText) Then Exit Do
    'This section does not work with Word tables...
    'cursor does not enter the table to start copying
    Selection.MoveDown Unit:=wdLine, Extend:=wdExtend
    LineCount = LineCount + 1
    Loop
    [/vba]

    1. extend the selection to the line
    2. check to see if current Selection contains the ending word (EndText) and if not move down a line and repeat #2.

    YIKES!

    Is this correct? Is this what you are doing?

  5. #5
    Question. Say Endtext is a word in a table. What do you want to happen? The whole table is used? Part of it up to EndText?

    Using Selection like this is very very inefficient.

  6. #6
    Hi Fumei,

    I'm sure that the bulk of my code is far from efficient as I've only been using VBA for a few days. The logic behind the "start" word find was to look for it, identify it then go the beginning of the word then highlight the word and subsequent text up to and including the "end" word.

    Once I find the word and it's highlighted I move left one character, it actually moves the cursor to the beginning of the word.

    [VBA]
    Selection.MoveLeft Unit:=wdCharacter, Count:=1
    [/VBA]



    I tried extending the selection to the line, but my implementation must be wrong as it highlights the entire document...

    Normally the "end text" would not be found in the table as it's a single line, but if it was I'd like to copy the entire table for pasting into Excel.

    -Pat

  7. #7
    VBAX Expert Tinbendr's Avatar
    Joined
    Jun 2005
    Location
    North Central Mississippi (The Pines)
    Posts
    991
    Location
    Can you upload two sample files to help resolve all the issues? One with the Word doc and one for Excel. In the Excel file, show the expected output.

    Zip and upload using Manage Attachments in Additional Options.

    Just a comment on your code so far.

    Learn to use ranges instead of Selection. The macro recorder uses Selection exclusively, but it's very inefficient and messy.

    David


  8. #8

    Hi All

    I am new to this forum, and have found the answer to the above mentioned issue. Please let me know if i can post it

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •