Consulting

Results 1 to 15 of 15

Thread: Scanning a Word document for specific text

  1. #1

    Smile Scanning a Word document for specific text

    Hello All,
    I looking to write a macro that searches through a word document and finds all of the words that begin and end in a capital letter then populates a table with the words it finds. The idea is to scan word a document to find possible acroynms contained within it.

    eg
    "This is some random SS text"
    whereby SS = System Specification.

    Obviously i need to keep track of the possible acronyms found as they are used many times within a document.

    Any help/sample code on this idea would be greatly appreciated.

    Thank you for your time.
    James Galea

  2. #2
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Hi aeroboy86. Welcome to VBAX!

    We are going to start at the beginning. What have YOU done so far to start this rolling?

    Have you tried searching the threads here? Have you tried setting up some search routines? What have you done?

  3. #3
    Hi Fumei

    I have been searching through this forums threads for a couple of days now and dont seem to be finding the information i require. I am very new to VBA but not to programming. I understand i am probably trying to climb a mountain before i can walk with my previous post but i just want to know if what i want to do is possible and if so a bit of a help getting started. If you can recommend a text book or good website also that would be greatly appreciated.

    Thanks
    James

  4. #4
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    1. What is it exactly that you require, that you say you have not found.

    2. Again, what have you tried so far? There are good web sites out there - including this one - but far more important is what YOU actually try.

    Post the code that you have tried so far, as we can probably suggest something.

    Using regular expressions would likely help.

  5. #5
    I dont have any code as i am not sure where to start. look i understand alot of post seem trivial to you as you obviously are very compotent with VBA as you reply to most post, but i have no experience in VBA and just want to learn. What i need to do is:

    1. Scan each word in a word doc 1 by 1 and see if they begin with i capital letter.

    eg.

    Here is SomE sample TexT

    in the above example the code i wish to write would scan the words above 1 by 1 and find obviously the text "SomE" and "TexT" as they begin and end in capital letter.

    Regards
    James

  6. #6
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    I understand perfectly what you want to do, and what you are asking.

    But I have no intention of just handing you code solution if you are not going to even try to do something for yourself.
    I dont have any code as i am not sure where to start.
    You start by starting. You DO know where to start - I am not sure why you say you don't.

    You know that to start you need to look at every word...now don't you? Don't you? Well...why are you saying you don't know where to start. Start there.

    Have you tried doing some code to even look at each word? That may be a good place to start. YOU have to start somewhere. I, nor anyone else, am going to just hand it to you. It is much better for you (or anyone else for that matter) to have a good handle on what is going on. And the only way to do that is to start trying things yourself.

    If you are not even going to try anything at all...well...good luck. Maybe someone will just give it to you. This is not a Help desk, and we are not here to hand over solutions to people who ask for them

    Again, try something. Post some code that you have tried. Tell us what is not working, and we will suggest things to will help to make it work.

    I even gave you a hint to perhaps start. Possibly try using regular expressions. However, you can do it within Word functions.

    Again, I understand perfectly what you are asking for.
    but i have no experience in VBA and just want to learn.
    EVERYONE here, and I do mean everyone, has primarily learned by actually trying to code stuff. Sorry, but you say you have programming experience (but not VBA)...well then...start programming.

    Take your test sentence - "Here is SomE sample TexT" and work on it. Post what you have as a start.
    in the above example the code i wish to write
    May I repeat YOUR words? ...."I wish to write"


    "I wish to write." I believe that means...you. Not me. So...write some. We will be glad to help you out when and if (and likely if) you have problems. I personally will be glad to help. And yes, I DO in fact post code solutions (and some very detailed sophisticated ones) for people. But I never do so for anyone who has not indicated that they are actually working on it.

    As I stated, what you are asking is not particulatly easy. It is NOT trivial. Finding words is easy, but you have very specific requirements. I am just about done a solution, but it is a bit messy. Show us that you are trying something.

  7. #7
    Hi Gerry

    Ok i started trying to figure it out today and i have thought of possibly an easier way to solve my problem. The idea as you already know is to find possible acronyms contained within a document (sorry for repeating myself), what i thought i could do is scan each word in a word document and search through a list in an excel worksheet.

    Can you let me know how i can make a reference to the excel library function ( I hope that is the right terminology ). I want to be able to write something like:

    [VBA]
    dim xlApp as Excel.Application
    dim xlBook as Excel.Workbook
    dim xlSheet as Excel.Worksheet

    xlApp = new Excel.Application
    xlBook = xlApp.workbook.open("the path to the xls sheet")
    xlSheet = xlBook.worksheet(1)
    [/VBA]


    im not sure if the above syntax is correct or not. can you please advise if this is correct. Im sorry if i ****ed you off before i had no intension to do so, and i understand that you cant do the work for me.

    Also i was trying to figure out how to work with each word in a document. i wrote the following:

    [VBA]
    dim w as object

    For Each w In ActiveDocument.Words
    msgbox w
    Next
    [/VBA]

    This went through the sample document and a msg box poped up one by one printing the word as its prompts. But when i tried to use an if statement to find a word:

    If w.Text = "test"
    msgbox w
    End If

    the above doesnt say there is any word "test" which there is, obviously this means that im not handling the text correctly. If you have any ideas that would be very helpful.

    Thanks again
    James

  8. #8
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    What exactly is your prpgramming experience? You mention it, but it may help to know what your background is for what kind of programming.

    You did not **** me off. I was simply stating what the situation is. It does not matter to me if you make this work for you, or not. The point being, is that it is you who need to do the work.

    You make a reference to Excel, by making a Reference to Excel. That is, if you are going to use early-binding, versus late-binding. You seem - by what code you posted - that you are going for early binding. Generally speaking (and no doubt there may be some here who would disagree), if possible, I think early binding is better.

    In any case, you make a Reference by (while in the VBE) using Tools > References. Find and add the Excel reference. I suggest you do some looking up on references.

    Regarding your code for each word. Let's look at it. Here is your first one:[vba]Dim w As Object
    For Each w In ActiveDocument.Words
    msgbox w
    Next [/vba]As you state, each word is displayed. I have something else to say on this, but we'll hold off for a sec.

    Now here is your second (I'll add the declaration as well):[vba]Dim w As Object
    For Each w In ActiveDocument.Words
    If w.Text = "test"
    msgbox w
    End If
    Next[/vba]You say it does not work. And of course, you are correct...it doesn't.

    1. Try w.Text = "test ". Note the trailing space.

    OR

    2. Try Trim(w.text) = "test"

    Each w will include the trailing space. BTW: you do not need to use w.text, just w will do. In fact, w does not have a .Text property.

    You probably noticed in your messages that you got more than you thought you would.

    "This is some text." FOUR words, right? Say you had your code that included a counter, like this:[vba]Sub EachWord()
    Dim aWord
    Dim i As Integer
    i = 1
    For Each aWord In ActiveDocument.Words
    MsgBox aWord & " Count= " & i
    i = i + 1
    Next
    End Sub[/vba]Running it would display SIX messages.

    This Count= 1
    is Count= 2
    some Count= 3
    text Count= 4
    . Count= 5
    Count= 6

    What is going on???? Well, the period is considered a word, distinct from "text" - the period is NOT considered a trailing space because it IS not a trailing space. (BTW: this is why it is better to use Trim(w.text), rather than w.text = "text ", because w may (or may not) have a trailing space.)

    The period ( . ) is considered a separate object.

    PLUS, the paragraph mark is also considered an object - which it is.

    Which is why I stated that what you are asking is not really all that trivial. You need to be very careful on how you deal with it.

    If I understand correctly, you are thinking of:

    1. loading an Excel file with a listing of accepted acronyms
    2. running through each word in the document and taking that word over to Excel and running it through each word in that list.

    Can be done, but this is a huge use of resources. It is a LOT of checking with a LOT of switching back and forth. Think about it. I have no real idea, but would you say that acronyms make up 10% of the document? Less? I am guessing less. Let's say, 2% of all the words in the document are acronyms . Even that may be high.

    In any case...that means 98% of all the work - picking up every word (separately) in Word, switching over to Excel, and checking that word against every single word in the list - is wasted work. 98% of the work is pointless.

    Does that make sense?

    What do you think you could do about that? Do you really need to send EVERY word over to be checked as an acronym? Maybe you have already figured that out. It is a logic issue.

    BTW: have you considered using a custom dictionary?

  9. #9
    Hi Gerry,

    I agree with you about the waste of reasources checking every word in a document, and im trying to figure out a better way. I am not familiar with a custom dictionary as of yet I will go have a look visual basic help, i agree with you it is a great reasource tool.

    Just so you know i am i third year avionics student, and my main back ground in programming is C, i do a bit of Visual Basic Express programming, C++ , Assembly, VHDL but i havent really done much work on VBA. I understand that Visual Basic 2005 Express is very similar, i just have never worked with MS Word using VBA.

    The main reason behind the idea of using excel is that the acronym list will need to be continually updated and i originally thought it would be a possible solution.

    Just a thought, do you think it is any better to possibly open the excel sheet grab an acronym then do a search through the document for an occurence of the acronym. In saying this you would still have to check for every acronym in the list, but i suppose it is better then sending every word then searching the excel sheet, what do you think?

    Anyway thanks for your help ill keep at it and hopefully figure something out.

    Cheers
    James

  10. #10
    Hi Gerry,

    I have been trying to figure out how i can check the fist and the last characters in a word.

    Question:
    1. Is there a better way to scan text then:
    [VBA]
    dim aWord as Object
    For Each aWord in ActiveDocument.Words
    'Do something
    Next aWord
    [/VBA]

    This may be a stupid question but from the above VB code are all words considered an object in word?

    Cheers
    James

  11. #11
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    It is not a stupid question at all. What do you think?

    All words are not considered objects in Word. In fact, there IS no "word" object in Word. However, what the code does is MAKE an object for each of the defined Ranges - which is what Word considers to be a "word".

    So, if I understand your question correctly, yes, every single "word" is considered as an object.

    For EACH aWord in ActiveDocument.Words
    ' do something

    For every single Range that I (Word) think of as a "word", yes, I will execute the following instructions....

    And if I come across a paragraph mark, then yes, that is a "word" to me (Word) and I will execute those instructions. If you use a couple of Enter key strokes to put "space" between paragraphs...each one of those IS a paragraph mark, and each one will have those instructions executed.
    I have been trying to figure out how i can check the fist and the last characters in a word.
    Ahhhhhh, finally. You have come upon why I stated that this is NOT trivial.

    Finding and checking a "word" is one thing, and is a very common task.

    Checking the structure of the word can be fairly easy, and certainly, you CAN do a check to see if the first and last character is capitalized.

    I know it may seem like I am giving you a hard time, but what I am really trying to do is nudge you into really thinking about it.

    So, OK, yes you are checking each and every word - what choice do you have if you want to...uh, check every word?

    And you want to check if the first and last letter is capitalized.

    WHAT would be the best thing to do first? I mean, once you have the word you are going to check.

    From a logic perspective (and this effort IS a logic operation), there are a number of possible operations that could be performed, but ONE of them is the best starting operation.

    WHAT is that?

  12. #12
    VBAX Expert Dave's Avatar
    Joined
    Mar 2005
    Posts
    835
    Location
    James, if each word found is in string format, this should help but it is untested. Dave
    [VBA]
    Public Function CapLet(Strtest As String)
    If (Asc(Right(Strtest, 1)) >= 65) And _
    (Asc(Left(Strtest, 1)) <= 90) Then
    MsgBox "Found one: " & Strtest
    End If
    End Function
    [/VBA]

    Something like this to use...
    [VBA]
    Dim w As Object, Str As String
    For Each w In ActiveDocument.Words
    Str = w.Text
    CapLet (Str)
    Next w
    [/VBA]

    As far as placing these in a table that comes next...

  13. #13
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Some comments.

    1. (Asc(Right(Strtest, 1)) >= 65) will ALWAYS return FALSE because Right(Strtest,1) will always be the trailing space Asc(32), except for the final word in a sentence.

    "This is SomE text with a CouplE with Caps." will NOT find SomE, or CouplE.

    Again, it would work if you use Trim, as I suggested.
    The value of w As Object includes the trailing space.

    2. Even corrected using Trim, the ASCII logic is flawed. Example: "This is SomE text."

    The word "This" will come back as "Found one."

    (Asc(Right(Strtest, 1)) >= 65) is TRUE, as "s" >65

    s = Asc(115)

    (Asc(Left(Strtest, 1)) <= 90) is TRUE, as "T" < 90

    T = Asc(84)

    So "This" will be "Found one." Which of course is incorrect.

    3. There is no need whatsoever for the string variable Str, as in:[vba]Dim w As Object, Str As String
    For Each w In ActiveDocument.Words
    Str = w.Text
    CapLet (Str)
    Next w [/vba]This works just fine:[vba]Dim w As Object
    For Each w In ActiveDocument.Words
    CapLet (w)
    Next w[/vba]Again, w is a object set as a string, and does not need .Text either. In fact, again, there IS no .Text property. There is no error for using .Text though.

    msgbox w
    msgbox w.test

    will give exactly the same values.

  14. #14
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Here is an alternative. And I have to say, there are a number of other ways to do this.

    If I understand this correctly, you have an Excel file with your list of capped words. Yes?

    In Excel, put the following in a public module.[vba]Public CappedWords() As String
    Public CappedWordsCounter As Integer[/vba]This sets up an string array of all the capped words, and a counter of them.

    In Word, put in the following:[vba]Sub TestCaps()
    Dim w As Object
    ' explicitly set counter = 0 so this
    ' can be repeatedly done for testing
    CappedWordsCounter = 0
    For Each w In ActiveDocument.Words
    ' check the LAST letter first
    ' logically it is only going to be a possibility if
    ' at least the last letter is capped
    ' there are no words with only the last
    ' letter capped
    If Asc(Right(Trim(w), 1)) >= 65 And _
    Asc(Right(Trim(w), 1)) <= 90 Then
    ' so IF the last letter is capped, THEN
    ' check the first letter. If the last letter is
    ' NOT capped, forget it, don't bother checking
    ' the first letter. Go to next word.
    Call CapWords(Trim(w))
    End If
    Next w
    ' this next calls a Sub to display the
    ' array of capped words, if there are any
    ' Note that even if only ONE word is capped
    ' the counter is incremented by 1, so even
    ' though the array may index at 0 (one word)
    ' the COUNTER would be 1
    If CappedWordsCounter > 0 Then
    Call ListCappedWords
    Else
    Msgbox "There are no first and last capped words."
    End If
    End Sub

    Sub CapWords(Strtest As String)
    ' this is only called if the LAST letter is capitalized
    ' so now check FIRST letter
    Dim bolFirst As Boolean
    ' if first letter is capped, boolean is TRUE
    ' note that the parameter Strtest is passed
    ' TRIM'd, so Left(Strtest,1) is the first letter
    Select Case Asc(Left(Strtest, 1))
    Case 65 To 90
    bolFirst = True
    End Select
    ' if it is TRUE, then you know BOTH first
    ' and last letters are capped....so
    ' redim the array and add word
    If bolFirst Then
    ReDim Preserve CappedWords(CappedWordsCounter)
    CappedWords(CappedWordsCounter) = Strtest
    CappedWordsCounter = CappedWordsCounter + 1
    End If
    End Sub

    ' now you have an array of all words that are capped
    ' with first and last letters
    ' and you can DO stuff with that array

    Sub ListCappedWords()
    ' this simply displays a message listing all
    ' the capped words...HOWEVER
    ' if the array was in Excel (not Word)
    ' you can change this to run through logic
    ' in the Excel file
    Dim msg As String
    Dim var
    For var = 0 To UBound(CappedWords)
    ' run through the array of capped words
    ' in this case, build message of all capped words
    ' OR you could do other logic processing on
    ' each item of array list
    msg = msg & vbCrLf & CappedWords(var)
    Next
    MsgBox msg
    End Sub[/vba]

    If you put all of the above code in Word, you will get a display of all capped words.

    The point being, is you can build an array of all the capped words, then work from that.
    Last edited by fumei; 12-10-2006 at 11:30 AM.

  15. #15
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    I know that looks like a lot of code, but it isn't really. It is heavily commented for you. Here is what it looks like without the comments.[vba]Public CappedWords() As String
    Public CappedWordsCounter As Integer


    Public Sub CapWords(Strtest As String)
    Dim bolFirst As Boolean
    Select Case Asc(Left(Strtest, 1))
    Case 65 To 90
    bolFirst = True
    End Select
    If bolFirst Then
    ReDim Preserve CappedWords(CappedWordsCounter)
    CappedWords(CappedWordsCounter) = Strtest
    CappedWordsCounter = CappedWordsCounter + 1
    End If
    End Sub

    Sub TestCaps()
    Dim w As Object
    CappedWordsCounter = 0
    For Each w In ActiveDocument.Words
    If Asc(Right(Trim(w), 1)) >= 65 And _
    Asc(Right(Trim(w), 1)) <= 90 Then
    Call CapWords(Trim(w))
    End If
    Next w
    If CappedWordsCounter > 0 Then
    Call ListCappedWords
    End If
    End Sub

    Sub ListCappedWords()
    Dim var
    For var = 0 To UBound(CappedWords)
    ' do stuff with each item in array
    Next
    End Sub[/vba]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •