PDA

View Full Version : comparing location of two phrases for defining acronyms



bigvince1981
10-20-2011, 05:14 PM
Hi all,

I'd like to write something that would make sure acronyms from my company's master acronym list are defined (properly) on first instance within each document. The master acronym list is long--at one line per acronym at 12 pt font it's around 100 pages, formatted as a two-column table. Each row contains the acronym in the left column and the definition on the right column. The format could easily be changed if necessary.

For example, let's say the document was about software as a service (SaaS). The first time "software as a service" appears, it should be immediately followed by the acronym "SaaS," which is enclosed in parentheses. There are three possibilities:

The term is already correctly defined the first time it is used.
The term appears first but is not immediately followed by the acronym.
The acronym appears by itself before any mention of the full term.In case 1, I'd like to do nothing. In case 2, I'd like to insert "(SaaS)" after the term. In case 3, I'd like to insert "software as a service" before the acronym and place parentheses around the acronym.

Once the first instance of the acronym is properly defined, it would be nice to be able to flag subsequent instances of cases 1-3 and choose whether to use just the term, just the acronym, or to redefine the acronym.

I'm extremely new to VBA, so my first question is this: Is what I'd like to do even possible (with VBA or some other method)? Second, would there be a way to accommodate acronyms with multiple definitions? Finally, as I begin to learn VBA (with this being my goal), any suggestions as to what concepts I should start with?

Thanks,

Vince

PS I've been on a lot of other forums for C++ and Python and people often think this type of post is to get easy answers for homework assignments. This isn't, but even if it is, I'm only asking if this is possible and if someone can point me in the right direction.

DeadKing
10-21-2011, 06:05 AM
Hi.

First - I'm quite new to VBA too, so don't take my answer as a dogma.

Anyway, as I see it, what you are trying to accomplish is possible, but time-consuming. You have to take every single term and acronym from the master acronym list and compare them with every single word and its surroundings in the rest of the document. Also, there can be problems if some of those acronyms looks same as normal word (e.g. AS - Access System).

Using multiple definitions for the acronym would cause same problem, but again, it is possible.

So the basic concept is to take first term of master acronym list, try to find it in the rest of the document, when found, check the word next to this one and see, if there is the acronym. If not, add it.
Now you don't need to check the rest, so move on to the new round with acronym of the term you used before. Check the word before (depends on the length of the term). Is is the term itself? If not, add the term.
Now do the whole thing again with new term. :)

Hope it helps a bit.

Frosty
10-21-2011, 09:38 AM
This is possible, and I wouldn't worry about optimization or how time-consuming it would be to begin with. You'll need to look up the use of at least some of the following:

1. The .Find object. Much faster than using string comparisons
2. I'd recommend loading your list of acronyms into some kind of public variable, which you reference within your code (and if the public variable is invalid within the function, you reload it).
3. Non-unique acronyms aren't really a problem... you'll just need to do some kind of user interaction. In fact, with this kind of process... I think you'd almost want to have a sanity check (analogous to Replace, as opposed to Replace All within the Find dialog) before proceeding.

I would think the process would go something along the following:

A. Load your list into a multi-dimensional array (?? suggestion, lots of ways to do this... I would use a class, personally), making sure that you flag, in some way, any acronyms which have multiple definitions. I'd recommend doing this on the fly rather than actually in the data table (to prevent error of a human forgetting to flag a new acronym as a multiple)... but since you're going to need this info some how, you need to flag it here.
B. Using the .find object, go through the document finding each instance of the current list item, adding the range of that item to a collection of ranges.
C. If your collection has any ranges, then evaluate each one to see if you've got a definition "near" it(that will let you know if you have any and where they are)
D. At that point, if you know everything you need and don't have any further "questions" (if the acronym isn't flagged as having multiple definitions, etc)... maybe you can get away with doing stuff without human interaction. But depending on the document length, I'd almost prefer having a "confirm" button (insert the text, select it, user interaction to confirm this was correct).

Other thoughts:
I'd probably write each term to an .ini file or something to allow you to come back and continue the process (there are lots of reasons to get interrupted when you're going through 1000s of potential search terms).

As you work on it, post bits of code (make sure to use Option Explicit in all your modules) and you should be able to get help on the individual parts.