Microsoft Excel Webinar

Results 1 to 8 of 8

Thread: Extracting text from Word to Excel between "start" and "end" words

  1. #1

    Extracting text from Word to Excel between "start" and "end" words

    Hi,

    I'm a novice VBA user in need of some help. I'm trying to extract sections of text (paragraphs) bounded by a "start" word and "end" word from a 150+ page word (Office 2003) document to an excel document. The document has a cover sheet, table of contents the with the paragraphs in question in the middle of this document. The paragraphs are always bounded by the same "start" word string that's enumerated and the same "end" word.

    START_1001
    Blah
    Blah
    Blah
    END_OF_PARAGRAPH

    START_1002
    Blah
    Blah
    Blah
    END_OF_PARAGRAPH


    I've attempted a section of code that somewhat works. The extract to excel part is troublesome as I can't seem to be able to index the row count properly not to have overlapping pasting of the paragraphs.

    Also I'm having trouble keeping the loop going from paragraph to paragraph to the end of the word document before closing the excel doc. How do you know you're at the end of the document?

    When the paragraph contains pictures in the word doc, when copied and pasted in excel the pictures are pasted on top of one another and on top of the text. Can this be corrected?

    Thanks in advance for the help.

    VB:
    Option Explicit 
     
    Sub CopyRequirementsBetweenWords() 
         
        Dim appExcel As Object 
        Dim objSheet As Object 
        Dim aRange As Range 
        Dim Endofline As Object 
        Dim intRowCount As Integer 
        Dim LineCount As Integer 
         
        intRowCount = 1 
         
        Do Until ActiveDocument.Bookmarks("\Sel").Range.End = _ 
            ActiveDocument.Bookmarks("\EndOfDoc").Range.End 'This is the "End of document" tag
             
            With Selection.Find 
                .Text = "START_" 'This is the "Start" word for the search
                .Forward = True 
                .Wrap = wdFindContinue 
                .Format = False 
                .MatchCase = False 
                .MatchWholeWord = False 
                .MatchAllWordForms = False 
                .MatchSoundsLike = False 
                .MatchWildcards = True 
            End With 
             
            Selection.Find.Execute 
            Selection.MoveLeft Unit:=wdCharacter, Count:=1 
             
            Do 
                Selection.EndKey Unit:=wdLine, Extend:=wdExtend 
                 
                 'This is the "End" word for the search
                If InStr(1, Selection.Text, "END_") Then Exit Do 
                 
                Selection.MoveDown Unit:=wdLine, Extend:=wdExtend 
                LineCount = LineCount + 1 
            Loop 
             
            Selection.Copy 
             
            If objSheet Is Nothing Then 
                Set appExcel = CreateObject("Excel.Application") 
                 
                 'Change the file path to match the location of your test.xls
                Set objSheet = appExcel.workbooks.Open("C:\temp\test1.xls").Sheets("Sheet1") 
                 
                intRowCount = 1 
            End If 
             
             
            objSheet.Cells(intRowCount, 1).Select 
            objSheet.Paste 
            intRowCount = intRowCount + LineCount 
             
            Selection.MoveDown Unit:=wdLine, Count:=LineCount 
             
        Loop 'Loop to next requirement for copy and pasting into Excel document
         
        If Not objSheet Is Nothing Then 
            appExcel.workbooks(1).Close True 
            appExcel.Quit 
            Set objSheet = Nothing 
            Set appExcel = Nothing 
        End If 
        Set aRange = Nothing 
         
    End Sub 
    
    
    Formatting tags added by mark007

  2. #2
    VBAX Expert Tinbendr's Avatar
    Joined
    Jun 2005
    Location
    North Central Mississippi (The Pines)
    Posts
    983
    Location
    Welcome to VBA Express!

    Try this.

    Add
    VB:
    Dim LastRow As Long 
    
    
    Formatting tags added by mark007
    Then
    VB:
    LastRow = objSheet.Range("A65536").End(xlUp).Row 
    objSheet.Cells(LastRow + 1, 1).Select 
    objSheet.Paste 
     'intRowCount = intRowCount + LineCount
     'Selection.MoveDown Unit:=wdLine, Count:=LineCount
    
    
    Formatting tags added by mark007
    David

  3. #3
    Hi David,

    I've actually worked out some fixes to my code issues. The loop works to continue to go through to the next paragraph, and I was able to delete the superimposed pictures.

    The issue now is that if there is a table in the paragraph, the code enters the paragraph but gets stuck in an infinite loop right before entering the table to start copying - not sure how to fix this... Any thoughts?

    Thanks!
    Pat

    VB:
    Option Explicit 
     
    Sub CopyRequirementsBetweenWords() 
         
         ' This macro copies requirements from a source document using a "Start" and "End" key word that defines a range
         ' and exports this selection to an Excel file.
         
         
        Dim appExcel As Object 
        Dim objSheet As Object 
        Dim aRange As Range 
        Dim Endofline As Object 
        Dim intRowCount As Integer 
        Dim LineCount As Integer 
        Dim StartText As String 
        Dim EndText As String 
        Dim OutputFileName As String 
         
        intRowCount = 1 
         
         'Define search parameters
        StartText = InputBox("Please enter your Start word") 'This is the "Start" word for the search
        EndText = InputBox("Please enter your End Word") 'This is the "End" word for the search
         
         'Define output parameters
        OutputFileName = InputBox("Please enter your Output File path, file name and extension (Ex: C:\Temp\Test.xls)") 
         
        Do Until ActiveDocument.Bookmarks("\Sel").Range.End = _ 
            ActiveDocument.Bookmarks("\EndOfDoc").Range.End 'This is the "End of document" tag
             
            With Selection.Find 
                .Text = StartText 'This is the "Start" word for the search from the Inputbox
                .Forward = True 
                .Wrap = wdFindContinue 
                .Format = False 
                .MatchCase = False 
                .MatchWholeWord = False 
                .MatchAllWordForms = False 
                .MatchSoundsLike = False 
                .MatchWildcards = True 
                LineCount = 1 
            End With 
             
            Selection.Find.Execute 
            Selection.MoveLeft Unit:=wdCharacter, Count:=1 
             
            Do 
                Selection.EndKey Unit:=wdLine, Extend:=wdExtend 
                 
                 'This is the "End" word for the search from the Inputbox
                If InStr(1, Selection.Text, EndText) Then Exit Do 
                 'This section does not work with Word tables...cursor does not enter the table to start copying
                Selection.MoveDown Unit:=wdLine, Extend:=wdExtend 
                LineCount = LineCount + 1 
            Loop 
             
            Selection.Copy 
             
            LineCount = LineCount + 3 
             
            If objSheet Is Nothing Then 
                Set appExcel = CreateObject("Excel.Application") 
                 
                 'Change the file path to match the location of your test.xls
                Set objSheet = appExcel.workbooks.Open(OutputFileName).Sheets("Sheet1") 
                 
                intRowCount = 1 
            End If 
             
            objSheet.Cells(intRowCount, 1).Select 
            objSheet.Paste 
            intRowCount = intRowCount + LineCount 
             
            Selection.MoveDown Unit:=wdLine, Count:=1 
             
             'Delete the newly pasted pictures in Excel
             
            On Error Resume Next 
            objSheet.DrawingObjects.Visible = True '"objSheet" enables stuff to happen in Excel
            objSheet.DrawingObjects.Delete 
            On Error Goto 0 
             
            objSheet.Cells.ClearComments 
             
        Loop 'Loop to next requirement for copy and pasting into Excel document
         
        If Not objSheet Is Nothing Then 
            appExcel.workbooks(1).Close True 
            appExcel.Quit 
            Set objSheet = Nothing 
            Set appExcel = Nothing 
        End If 
        Set aRange = Nothing 
         
    End Sub 
    
    
    Formatting tags added by mark007

  4. #4
    Let me see if I have this correctly.

    You search for the starting word (StartText), and then, if found, move one character to the left.
    VB:
    Selection.MoveLeft Unit:=wdCharacter, Count:=1 
    
    
    Formatting tags added by mark007
    And then
    VB:
    Do 
        Selection.EndKey Unit:=wdLine, Extend:=wdExtend 
         
         'This is the "End" word for the search from the Inputbox
        If InStr(1, Selection.Text, EndText) Then Exit Do 
         'This section does not work with Word tables...
         'cursor does not enter the table to start copying
        Selection.MoveDown Unit:=wdLine, Extend:=wdExtend 
        LineCount = LineCount + 1 
    Loop 
    
    
    Formatting tags added by mark007
    1. extend the selection to the line
    2. check to see if current Selection contains the ending word (EndText) and if not move down a line and repeat #2.

    YIKES!

    Is this correct? Is this what you are doing?

  5. #5
    Question. Say Endtext is a word in a table. What do you want to happen? The whole table is used? Part of it up to EndText?

    Using Selection like this is very very inefficient.

  6. #6
    Hi Fumei,

    I'm sure that the bulk of my code is far from efficient as I've only been using VBA for a few days. The logic behind the "start" word find was to look for it, identify it then go the beginning of the word then highlight the word and subsequent text up to and including the "end" word.

    Once I find the word and it's highlighted I move left one character, it actually moves the cursor to the beginning of the word.

    VB:
    Selection.MoveLeft Unit:=wdCharacter, Count:=1 
    
    
    Formatting tags added by mark007


    I tried extending the selection to the line, but my implementation must be wrong as it highlights the entire document...

    Normally the "end text" would not be found in the table as it's a single line, but if it was I'd like to copy the entire table for pasting into Excel.

    -Pat

  7. #7
    VBAX Expert Tinbendr's Avatar
    Joined
    Jun 2005
    Location
    North Central Mississippi (The Pines)
    Posts
    983
    Location
    Can you upload two sample files to help resolve all the issues? One with the Word doc and one for Excel. In the Excel file, show the expected output.

    Zip and upload using Manage Attachments in Additional Options.

    Just a comment on your code so far.

    Learn to use ranges instead of Selection. The macro recorder uses Selection exclusively, but it's very inefficient and messy.
    +------+
    | David |
    +------+

  8. #8

    Hi All

    I am new to this forum, and have found the answer to the above mentioned issue. Please let me know if i can post it

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •