PDA

View Full Version : Token replacement by label/value strings in Word 2007



JettyGuy
08-14-2011, 12:30 PM
I would like to set up a mapping from a "token" to a pair so that occurrences of the token in a Word 2007 document can be replaced (either dynamically or by post-processing) by a label (which is a text string) and a value (which is another text string). I am using the word "token" with some caution, hopefully avoiding words that already have special meaning in the Word context.

An example triple: token=bk, label="book", value="12x". The tokens are unique. There may be thousands of such mappings, so they should be in some kind of table (or Excel spreadsheet) separate from the main document content. The triples are all specific to a particular document.

In the Word document, all I want to appear is some form of the token. For example, an XML-style token (which would be perfectly fine if I knew how to do the rest) might be <bk/>. I want to be able to modify the table, by adding and deleting tokens, and by modifying their labels or values. Preferably the substitution of the pair for the token should be done as a post-processing step, with the output in a second document, separate from the original one. That way, I could continue to modify the first document, and create second documents as needed for printing.

I know that this can be done with VBA, and maybe also somehow with XML, but I would appreciate some help in how to set this up. I am not at all a VBA or other VB programmer, and this is hopefully the only time I will ever need to do anything like this. It is for my personal use only.

Any help will be appreciated!

Frosty
08-15-2011, 08:06 AM
I would wait to get a couple of replies from other people as well, but it seems to me that, if you have it, you should look at using Access instead of Word. You're describing, at the core, the desire to have a) a database of "stuff" b) be able to modify that "stuff" c) generate "reports" (documents) based on that "stuff"

You can do this in Word, but it might be a round-peg in a square-hole solution. Check out the northwind database and some tutorials on Access. You may not need to program much (if anything). Just the "tokens are unique" programming in VBA in Word would be more complicated than simply assigning a key field to your access database (which would prevent you from using "bk" if you'd already used it).

JettyGuy
08-15-2011, 09:00 AM
Thanks very much for your comment. However, this is a rather extensive text document with lots of formatting and built in MathType/TeX equations. In other words, the token "stuff" accounts for maybe 5 percent of the total stuff in the document.

The tokens are abbreviations within the document for text items that are used repeatedly, the labels are expansions for the abbreviations, and the values are references to parts of figures where the token item is illustrated.

At this point, I am thinking about enclosing the tokens in double brackets, e.g., "[[bk]]", in the Word doc, and saving the doc as an XML file. The table of triples would be in a simple text file. I would run the text file through a little Java or C program to create a file of sed commands. (The program would also add the XML to make the values appear in bold, something I hadn't mentioned before as a requirement.) The file of sed commands would be applied to the XML file. The updated XML file would then be opened with Word. Everything except saving and re-opening the Word doc could be automated in a shell script running under Cygwin. Ugly, but it would work. The approach has the advantage of being applicable without modification to multiple documents having similar requirements, because the replacement process is entirely external to Office.

Frosty
08-15-2011, 09:15 AM
Quite honestly, some of what your saying I don't understand. It still sounds like your word document is going to be your database file. Which is fine, as obviously text files have been used as pseudo-database files. Barring someone else's input on the concept... Maybe you should give an example of your input and desired output.

There are many ways to have things done, both as a procedure (click a button, change some stuff) and dynamically (harder but do-able), but it may be better to get concrete samples of input and desired output... And then some code samples can get you started on whether this will, ultimately, be the best process (the overhead of the Word application may not be desirable... Although obviously Word's find capability is pretty robust, it's still not as good as SQL)

Frosty
08-15-2011, 09:18 AM
So, to simplify my comment... What do you need Word to do for you? That I (or someone) can certainly help give code for

JettyGuy
08-15-2011, 10:39 AM
The Word document is simply a normal text document, with lots of embedded tokens, and definitely not a database. All I want to do is a simple substitution as a post-processing step, as I will describe in more detail below.

Each token will be associated with a label and a value in a table. The label and value are each a text string. Suppose one triple is “bv|bound volume|23c”. Let’s call the document with the token the “source doc”. Then the source doc might have the sentence:
Fig. 7 illustrates a [[bv]] covered in leather, each page edged in gold leaf.
The token [] might appear in the document many times. The target file after post processing should then have the sentence:
Fig. 7 illustrates a bound volume[B] 23c covered in leather, each page edged in gold leaf.
(Note that the value is bolded.)


Here is the basic process:
1. Define token triples in a table.
2. Open source doc, which contains lots of different tokens embedded in ordinary sentences as abbreviations as illustrated above, each token possibly appearing many times.
3. Give a single command to convert all the tokens in the source file into their respective label-value text equivalents. The command then does the following:
a. Read a token from the table.
b. Replace all occurrences of the token in source file.
c. If more tokens, repeat a and b.
4. Save the document as the target doc.
5. Possibly modify the source doc and/or the token table. Repeat steps 2-5.

JettyGuy
08-15-2011, 10:55 AM
More realistic example:

Sentence in source doc:
Fig. 7 illustrates a [] covered in [[leather]], each [[pg]] of the [[bndvol]] edged in [[gl]].

Corresponding sentence in target doc after replacement:
Fig. 7 illustrates a bound volume[B] 23c covered in leather 495, each page 7 of the bound volume 23c edged in gold leaf 12.
Here the triples are:
bndvol|bound volume|23c
leather|leather|495
pg|page|7
gl|gold leaf|12

JettyGuy
08-15-2011, 11:16 AM
And in my 5 step process, here is what I don't know how to do:

Create the table of tokens. Where and how should that table be created, formatted, etc.?
Represent the tokens in the source file itself. I enclosed them in double brackets, but that was simply for illustration, and any other convenient format would do.
Implement a post-processing command that applies the table of tokens to the source file, thereby creating a target file.
Invoke the command to do the post-processing.

Frosty
08-15-2011, 01:29 PM
It sounds to me like you're pretty close to the concept of a mailmerge. You're trying to take a "form document" (what you're calling your "source file") and "merge" it with your data (what you're calling your "table of tokens").

And the result of that merge is your "target document"

But you're trying to combine two data elements into a single "tag"... which might not be necessary (at least, if it isn't necessary, you won't need to do any coding whatsoever). Can't you just as easily have:

bndvolType|bound volume
bndvolNum|23c

...as your data, and have your form document be...

Fig. 7 illustrates a <<bndvolType>> <<bndvolNum>> covered...

Sorry to be difficult, but it sort of seems like you're trying to invent a wheel which has already been invented.

In answer to your question:
Your data can be anywhere, honestly. How you create it is really up to you (where is it now?). If you want to start typing it into a word table... just create a three column word table and start typing. If you can separate your 3 bits of data into 4 bits of data, I think mailmerge is going to work.

I'm not trying to talk down or give you too much pushback... but if you're married to using Word for some part of this process, why not just use Word functionality which already exists, rather than write some custom code to identify tokens and two associated bits of data.

Other than that... if the format of the sourcedata (the mapped fields) is locked because it is created by the outside processing, then I think it might be better to have something even more unlikely to be a typo (i.e., **[bndvold]**), and then you'll need to decide if the 2nd piece of data is always bold or not, because otherwise you'll need an additional field in your sourcedata (as opposed to your sourceform) to determine whether piece 1 or piece 2 or both is supposed to be bold.

In short- take a look at the mail merge process in Word and see if that gets you pretty close to the desired output.

JettyGuy
08-15-2011, 03:32 PM
Frosty, thank you for your thoughtful answer. But your approach means that for each of potentially thousands of instances of potentially hundreds of tokens, I have to insert two things into the document instead of one. That would be way too time consuming, and it would also clutter the document. With a single token representing the label/value pair, the document is still readable.

I admit I am not a Word/Office guru. But I would be happy to use built-in Word capabilities if any exist that can do the job adequately!