Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 40

Thread: Macro to find string and copy sentence containing string.

  1. #1
    VBAX Regular
    Joined
    Nov 2004
    Posts
    27
    Location

    Macro to find string and copy sentence containing string.

    Hello,

    I need help with a Word macro.
    I would like to search a document for a string. If the string is found I want to copy the sentence containing the string to an Excel document.


    I have a little experience with Excel and Access VBA but none with the objects and methods used with Word. I would be grateful if someone could at least get me started.
    I am using Word 2000.

    Thanks.

  2. #2
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Hi Ron!

    I read your question 2 or 3 days ago, and I thought "wow, that sounds simple, I'll go back and answer that when I have time, but I bet someone else will surely answer any minute now before I get a chance."

    well...

    maybe it wasn't so simple, and maybe that's why no one has jumped on it yet. I began fooling around with this this morning, and between going out to lunch and running a few errands, this problem has taken me just about all day!! I can't believe it.

    It sounds like you actually want to learn about Word macros, Ron, rather than just copy and paste them. So I would suggest that if you wish to understand the macro I have written, please read the entire thread regarding Johnske's question, Cleaning up a Word document.

    The key here is the use of "wildcards." Before I wrote this macro for you, I studied the site that mdmackillop suggested in the thread about Johnske's question. (the suggested site is: Finding and Replacing Characters Using Wildcards, on the Word MVPs site)

    So...

    what this macro does is loop through all sentences in a document. It will identify a "sentence" as anything that begins either at the beginning of a paragraph or at the end of a prior sentence (meaning that it follows a . or a ? or a !) and that subsequently ends at the very next . or ? or ! without any intervening hard returns.

    As the macro identifies each sentence, it then does a "mini search" within the sentence to find your desired word. If the desired word is found, then the sentence is saved in memory.

    Once all sentences have been searched, then we can tell the macro (in the future) to do whatever we want with the "good" sentences.

    Right NOW, in this current macro, a new document is created and a TABLE is inserted with each sentence in a single cell of the TABLE. So, you may then manually copy the table and paste it into an Excel spreadsheet. That part can also later be automated within the macro.

    Also, I found out that my macro fails (in a very aggravating way!!!) if the document contains fields or hyperlinks. It may also fail if the document has other "strange stuff" in it, but I have definitely confirmed that it does not like fields or hyperlinks.

    BUT!! DON'T WORRY IF YOUR DOCUMENT HAS THAT STUFF. I incorporated a "workaround" into my macro. Before searching, the macro actually copies the entire document and does a "text only" paste operation into a HIDDEN document (that can't be seen on screen but that is running in the backgroun). So the macro will safely be fed ONLY TEXT from the hidden doc.

    Here it is:

    [vba]Sub CopyCertainSentences()

    Dim myOriginalDoc As Document
    Dim myHiddenDoc As Document
    Dim OldValue As Long
    Dim ArrayOfSentences() As String
    Dim TargetWord As String

    TargetWord = Trim(InputBox("Enter the word that will be used to select sentences:"))
    If TargetWord = "" Then MsgBox "No word entered." & vbCr & vbCr & "EXITING MACRO": End

    ReDim ArrayOfSentences(0)
    Set myOriginalDoc = ActiveDocument
    Set myHiddenDoc = Documents.Add(, , wdNewBlankDocument, False)

    On Error GoTo Ending 'if an error interferes, I want to at least close HiddenDoc

    '*********************************
    '*******Transfer the text*********
    myOriginalDoc.Select
    Selection.Copy

    myHiddenDoc.Select
    Selection.PasteSpecial Link:=False, DataType:=wdPasteText, Placement:=wdInLine, DisplayAsIcon:=False
    '*********************************
    '*********************************

    myHiddenDoc.Bookmarks("\StartOfDoc").Select

    '****Set search parameters***************
    Selection.Find.ClearFormatting
    With Selection.Find
    .Text = "": .Replacement.Text = "": .Forward = True: .Wrap = wdFindStop
    .Format = False: .MatchCase = False: .MatchWholeWord = False: .MatchWildcards = True
    .MatchSoundsLike = False: .MatchAllWordForms = False
    End With
    '****************************************

    Do

    Selection.Find.Text = "[^13.\?\!]{1}[!.\?\!^13]@[.\!\?]{1}"
    Selection.Find.Execute

    OldValue = Selection.Start

    Selection.Find.Text = "<*>*[.\!\?]{1}"
    Selection.Find.Execute

    If InStr(1, Selection.Text, TargetWord, vbTextCompare) > 0 Then
    ArrayOfSentences(UBound(ArrayOfSentences)) = Selection.Text
    ReDim Preserve ArrayOfSentences(UBound(ArrayOfSentences) + 1)
    End If

    Selection.Collapse wdCollapseEnd
    Selection.MoveLeft wdCharacter, 1, False

    Loop While Selection.Start > OldValue

    Ending:
    myHiddenDoc.Close False
    DoEvents
    If Err.Number > 0 Then On Error GoTo 0: Resume 'go back to any error after closing HiddenDoc

    ExcelBusiness ArrayOfSentences

    End Sub


    Function ExcelBusiness(ArrayOfSentences() As String)

    Dim sen As Long 'a counter to loop through the sentences
    Dim myNewDoc As Document

    Set myNewDoc = Documents.Add

    For sen = 0 To (UBound(ArrayOfSentences) - 1)

    myNewDoc.Bookmarks("\EndOfDoc").Select
    Selection.TypeText ArrayOfSentences(sen) & vbCr

    Next

    myNewDoc.Select
    Selection.ConvertToTable wdSeparateByParagraphs, , 1, , wdTableFormatNone

    End Function
    [/vba]


    I'm also attaching the test document that I have been using. So far so good with my test document. the macro seems to be doing exactly what I want it to.

  3. #3
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    I was continuing to test just now, and I realized the following:

    If you search for "we" for example, then sentences that DON'T have "we" but that do have things like "answer" (see the we in ansWEr?) will be chosen.

    I will try and fix this tomorrow.

    so for now, beware... if you search for "small" words, you may get incorrect results that contain your small word WITHIN a larger word

  4. #4
    Administrator
    Chat VP
    VBAX Guru johnske's Avatar
    Joined
    Jul 2004
    Location
    Townsville, Australia
    Posts
    2,872
    Location
    Quote Originally Posted by Kelly
    I was continuing to test just now, and I realized the following:

    If you search for "we" for example, then sentences that DON'T have "we" but that do have things like "answer" (see the we in ansWEr?) will be chosen.

    I will try and fix this tomorrow.

    so for now, beware... if you search for "small" words, you may get incorrect results that contain your small word WITHIN a larger word
    If you're looking for such smaller words like this, try - (space)we(space) for your search
    You know you're really in trouble when the light at the end of the tunnel turns out to be the headlight of a train hurtling towards you

    The major part of getting the right answer lies in asking the right question...


    Made your code more readable, use VBA tags (this automatically inserts [vba] at the start of your code, and [/vba ] at the end of your code) | Help those helping you by marking your thread solved when it is.

  5. #5
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    unfortunately, Johnske, (space)we(space) won't work because of the following line:

    TargetWord = Trim(InputBox("Enter the word that will be used to select sentences:"))

    the "trim" function gets rid of preceding or trailing space characters. I don't know exactly what made me use the trim function. Habit, I guess.

    However, taking out the trim part would not entirely solve the problem. If I took out trim, and then searched for " we " - then any case where "we" is the first word in the sentence would not be chosen.

    so.... I've got more work to do

    Also, it ocurred to me that I have not tested my macro under the circumstance where NO instances of the search text are found. In that case, I should modify the macro so that it doesn't open a blank document in which to create a table if there is no data to include in the table.

  6. #6
    Knowledge Base Approver VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    You could use a RegExp to find the string. The code below would count " we " but not "answer" as a match.

    Cheers

    Dave

    [vba]
    Sub FindString()
    Dim Regex As Object, MatchCol As Object
    Dim TargetWord As String
    TargetWord = Trim(InputBox("Enter the word that will be used to select sentences:"))
    If TargetWord = "" Then MsgBox "No word entered." & vbCr & vbCr & "EXITING MACRO": End
    Set Regex = CreateObject("vbscript.regexp")
    With Regex
    .Global = True
    .ignorecase = False
    'insert string to be found inside wound boundary markers, ie \bstring\b
    .Pattern = "\b" & TargetWord & "\b"
    End With
    ActiveDocument.Select
    Set MatchCol = Regex.Execute(Selection)
    Select Case MatchCol.Count
    Case 0
    MsgBox "string " & TargetWord & " not found"
    Case 1
    MsgBox TargetWord & " found " & MatchCol.Count & " time"
    Case Else
    MsgBox TargetWord & " found " & MatchCol.Count & " times"
    End Select
    Set MatchCol = Nothing
    Set Regex = Nothing
    End Sub
    [/vba]

  7. #7
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Thanks, Dave (brettdj)!

    I have a question:

    when you use "Set Regex = CreateObject("vbscript.regexp")" does that mean that the person who will use the macro does NOT have to go add a reference to the vbscript library using the VBE ?

    In the past, I have used the vbscript regexp by usind a Dim statement, a la "Dim myRexExpObject As New RegExp" which required the macro user (if you ever copied and pasted the macro) to add a reference to the vbscript library using the visual basic editor.

    I'm hoping you will tell me that the CreateObject function circumvents that little problem. If so, I'm a convert!!!

    so tell me a bit more about CreateObject, and thank you very much for educating me!

  8. #8
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hello All,

    Just returned from a weekend away or would have posted earlier. I think Kelly's first thought was correct and this is fairly simple.

    All you need to do is exactly as per the original request - search for a string and copy the sentence. It doesn't really matter what the string is - and Word's Find accepts a subset of regular expressions so there shouldn't normally be a need to use the RegExp object.

    [VBA] ' (set up your Find using the Selection.Find Object)

    Do While Selection.Find.Execute
    MsgBox Selection.Sentences(1)
    Loop
    [/VBA]

    Obviously you want to do something other than the Msgbox and there is the presumption that the string being searched for does not itself contain a sentence delimiter,
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  9. #9
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Oh you're kidding me!

    the selection object has a "sentences" property/method ????

    well.... ain't my face red

    I don't know why, but I was convinced there was no such thing as .sentences(1)

    I know there are .paragraphs(x) and .words(x), but for some reason I was thinking "sentences" was one of those seeminly logical things that somehow got left out of Word VBA. Like selection.pages(x).Select

    ... don't tell me there actually is a .pages(x) .... ? There isn't, right?

  10. #10
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Kelly,

    Quote Originally Posted by Kelly
    I was thinking "sentences" was one of those seeminly logical things that somehow got left out of Word VBA.
    Yes there are a few of those.

    And, no, there isn't a pages collection, but ..

    .. there is a built-in (I think Word calls them predefined) bookmark that does almost the same ..

    [VBA] Selection.Bookmarks("\Page").Select[/VBA]
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  11. #11
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Okay, now I'm not feeling quite so ridiculous about my original solution.

    I decided to test Tony's suggestion. I only wanted to test what would happen with .sentences(1) if the sentence contained a hyperlink. I remembered that my original macro had problems when there was a hyperlink or a field in the sentence, which is why I copied everything as text before running the macro. So I figured Tony's idea was obviously elegant and ideal, but I wanted to see if it would still be necessary to make sure we had a TEXT ONLY version of the document before running the macro.

    Well....
    hyperlinks are apparently not a problem, but...

    I discovered some unexpected issues. Firstly, we still have the problem of the "little words." For example, I searched my sample page for "to" and the first word selected was "Houston"

    But, even worse....

    it found Houston, and it just so happens that "Houston" is not in a sentence. It is in a sub-heading. BUT BUT BUT.... apparently Word VBA has decided that sub-headings are still part of sentences. So, the sample text was:

    Bush turns up the heat on N. Korea at APEC summit
    Houston Chronicle - 12 hours ago
    Putting security threats at the top of his agenda, President Bush on Saturday won commitments from Russian and Asian leaders to press the North New York Times Koreans to eliminate their nuclear weapons programs.

    (That "New York Times" thing is a hyperlink that I threw in for testing. I wanted to search for the sentence that begins with "Putting" and then see if the hyperlink would interfere with selecting the sentence.)

    Anyway, when I search for "to," the "sentence" that is selected is ALL of the following:

    Bush turns up the heat on N. Korea at APEC summit
    Houston Chronicle - 12 hours ago
    Putting security threats at the top of his agenda, President Bush on Saturday won commitments from Russian and Asian leaders to press the North New York Times Koreans to eliminate their nuclear weapons programs.


    I still think that Tony's solution is more elegant than mine. And, in reality, the occasional "non-sentences" that Word would generate would probably not be a big problem for whatever Ron is trying to do.

    However, if we ever had to be 100% sure that what we are getting is truly a sentence, then ".sentences(x)" is not foolproof

  12. #12
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Quote Originally Posted by TonyJollans
    Hi Kelly,

    Yes there are a few of those.
    Thanks a million for that! I knew you would be understanding! <big grin>

  13. #13
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Quote Originally Posted by Kelly

    However, if we ever had to be 100% sure that what we are getting is truly a sentence, then ".sentences(x)" is not foolproof
    Although, my macro cannot offer a 100% guarantee of "sentence-hood" either, because how could we EVER ensure the presence of a subject and a predicate!!!!!!!


  14. #14
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Kelly,

    Your sample looks ike something lifted off the web. The only reason I can see for the results you get is if it has manual line breaks instead of paragraph marks. Can you post your test doc?

    As for the 'little words' problem - that is standard Find behaviour - if you want words use wildcards and "<to>" as a search string.

    I must say, I do like the idea of Word refusing to accept grammatically incorrect sentences.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  15. #15
    VBAX Regular Kelly's Avatar
    Joined
    Jun 2004
    Location
    Los Angeles, California
    Posts
    84
    Quote Originally Posted by TonyJollans
    Hi Kelly,

    Your sample looks ike something lifted off the web. The only reason I can see for the results you get is if it has manual line breaks instead of paragraph marks. Can you post your test doc?
    Yes, I needed hyperlinks, so I figured I would copy text from the web. I copied my test paragraphs from Google news.

    drumroll please....

    you are right! (thank god, because I really wanted .sentences(1) to work!!!)

    when I manually type the same headings and other text, the macro "grabs" the correct pieces of text.

    however, when I use the stuff I pasted from the web, the macro grabs the whole mini-article!

    Here's a tester for you. I butchered my original macro so it looks like this:

    [vba]Sub TestingSentenceSelection()

    TargetWord = Trim(InputBox("Enter the word that will be used to select sentences:"))
    If TargetWord = "" Then MsgBox "No word entered." & vbCr & vbCr & "EXITING MACRO": End

    Set myOriginalDoc = ActiveDocument

    myOriginalDoc.Bookmarks("\StartOfDoc").Select

    '****Set search parameters***************
    Selection.Find.ClearFormatting
    With Selection.Find
    .MatchWildcards = False

    .Text = "": .Replacement.Text = "": .Forward = True: .Wrap = wdFindStop
    .Format = False: .MatchCase = False: .MatchWholeWord = False
    .MatchSoundsLike = False: .MatchAllWordForms = False
    End With
    '****************************************

    Selection.Find.Text = TargetWord
    myBoolean = Selection.Find.Execute

    If myBoolean = True Then Selection.Sentences(1).Select: MsgBox "I have selected the sentece"

    End Sub[/vba]

    run the macro and search for "Houston," and you will now see that the macro runs correctly. (because I manually typed the Houston article)

    then run the macro and search for "Xinhua" (see 2nd article) and watch how the whole thing gets selected.

  16. #16
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Yes, it's those manual line breaks - chr(11) - Word doesn't treat them as delimiters - bit poor really.

    Guess we need to wait for ron to come back and see what he needs.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  17. #17
    Knowledge Base Approver VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    As Tony stated above, Word does have RegExp like searching - I tend to stick to RegExp as I know it better

    Quote Originally Posted by Kelly
    Thanks, Dave (brettdj)!
    when you use "Set Regex = CreateObject("vbscript.regexp")" does that mean that the person who will use the macro does NOT have to go add a reference to the vbscript library using the VBE ?!
    Yes, late binding avoids adding the reference. If I was using early binding I'd do it as below

    Dim myRexExpObject As RegExp
    Set myRexExpObject = New RegExp

    Cheers

    Dave

  18. #18
    VBAX Regular
    Joined
    Nov 2004
    Posts
    27
    Location
    All,

    Thanks for the help.

    It will take me a while to go though and understand all of your suggestions. I have also been working on my problem with a different approach.

    It appears I solved the problem of finding a word and copying the sentence that contains the word. I am having trouble opening an Excel document and pasting the coped sentence in. My statements and function to open the Excel document does not work from Word although I did test them in Excel and they worked, any thoughts on this? I am also not sure about my code to copy each sentence found to the next row.

    In addition I will have to address in my code the circumstance of having the same word multiple times within the same sentence.

    I realize there are many way to approach the same problem and I am open all suggestions. Thanks again.

    My code is below:
    Note: I have barrowed some code from several sources.

    Also any tips on coping code into a message on this forum. I had a hard time formatting.

    Ron

    [VBA]Sub FindWordCopySentence()

    Dim aRange As Range
    Dim intRowCount As Integer

    intRowCount = 1
    Set aRange = ActiveDocument.Range

    With aRange.Find
    Do
    .Text = "shall" ? the word I am looking for
    .Execute
    If .Found Then
    aRange.Expand Unit:=wdSentence
    aRange.Copy

    'checking if file is open if not open file
    If Not WorkbookIsOpen("SOW_shall.xls") Then
    Workbooks.Open Filename:="C:\SOW_shall.xls"
    End If

    'my attempt to copy each sentence found to the next row in excel document
    Sheets("Sheet1").Select
    Cells(intRowCount, 1).Select
    aRange.Paste
    intRowCount = intRowCount + 1
    Loop While .Found
    End With
    End Sub

    ? function to check if file is open
    Private Function WorkbookIsOpen(wbname) As Boolean
    'Returns TRUE if the workbook is open
    Dim x As Workbook
    On Error Resume Next
    Set x = Workbooks(wbname)
    If Err = 0 Then WorkbookIsOpen = True _
    Else WorkbookIsOpen = False
    End Function[/VBA]

  19. #19
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Ron,

    Expanding the Range is fine but, if doing so, to deal with the multiple occurrences - indeed to make your loop work - you need to collapse the range after you've extracted the sentence and before looking for the next one. The reason for this is that the Find redefines the Range and later searches start after the redefined range, BUT if you have further redefined it, later searches assume you want to look only in the newly defined range - except when it's collapsed when that makes no sense - did that all make sense??

    After the line

    Arange.Copy

    add the line

    aRange.Collapse wdCollapseEnd

    Now for the Excel bit. You need to address an Excel object before you can address Workbooks, etc. Whilst I would normally applaud the use of a separate Function, in this case it rather complicates it and I would bring all the code inline. Some quick changes to your code and this should work ..

    [VBA] Sub FindWordCopySentence()
    Dim appExcel As Object
    Dim objSheet As Object
    Dim aRange As Range
    Dim intRowCount As Integer
    intRowCount = 1
    Set aRange = ActiveDocument.Range
    With aRange.Find
    Do
    .Text = "shall" ' the word I am looking for
    .Execute
    If .Found Then
    aRange.Expand Unit:=wdSentence
    aRange.Copy
    aRange.Collapse wdCollapseEnd
    If objSheet Is Nothing Then
    Set appExcel = CreateObject("Excel.Application")
    Set objSheet = appExcel.Workbooks.Open("C:\SOW_shall.xls").Sheets("Sheet1")
    intRowCount = 1
    End If
    objSheet.Cells(intRowCount, 1).Select
    objSheet.Paste
    'checking if file is open if not open file
    'If Not WorkbookIsOpen("SOW_shall.xls") Then
    'Workbooks.Open FileName:="C:\SOW_shall.xls"
    'End If
    'GetOpenWorkbook("C:\SOW_shall.xls").sheets("Sheet1").Cells(intRowCount, 1).Select
    'GetOpenWorkbook("C:\SOW_shall.xls").sheets("Sheet1").Paste
    'my attempt to copy each sentence found to the next row in excel document
    'Sheets("Sheet1").Select
    'Cells(intRowCount, 1).Select
    'aRange.Paste
    intRowCount = intRowCount + 1
    End If
    Loop While .Found
    End With
    If Not objSheet Is Nothing Then
    appExcel.workbooks(1).Close True
    appExcel.Quit
    Set objSheet = Nothing
    Set appExcel = Nothing
    End If
    End Sub
    [/VBA]
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  20. #20
    VBAX Regular
    Joined
    Nov 2004
    Posts
    27
    Location
    Tony,

    That worked great!!

    I see I need to become familiar with creating objects.

    I have some more work to do on this project.

    I would like to remove the formatting before pasting. I think that would be done with paste special. In addition I would like to copy the paragraph number and heading which contained the sentence, and also paste into the Excel document.
    I am going to try to hammer out those requirements. I hope all of you are available if I run into trouble.
    In the event I require help would I post in the same thread or start a new one?

    Best Regards,
    Ron

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •