I have a large dump of data in RTF format (1100 pages) consisting of English terms and their French translations and need to rearrange it to a different format for use in translation support. Each term is presented side-by-side, and if one or both require more width than is available for the presentation format, it is continued on the next line(s). Each line is ended by a return, and tabs are used to position the content of the lines (Word tables are not used). Here's a mockup (please ignore the periods; I couldn't figure out how to render this so multiple spaces wouldn't convert to one, or how to set the tabs to show the layout):
.Terme.*.Guide.to.Effective.Risk.....*.Guide.de.la.gestion.du.risque
.........Management.and.Contingency....et.de.la.planification.des
.........Planning.in.Support.of........mesures.d'urgence.relativement
.........the.Year.2000.Challenge.......au.probl?me.de.l'an.2000
.Date.de.creation..1998/10/19
.Terme .*.Workplace.Safety.and........*.Commission.de.la.s?curit?
.........Insurance.Board...............professionnelle.et.de
.......................................l'assurance.contre.les
.......................................accidents.du.travail
.Date.de.creation..2000/06/28
To build the full English and French terms, I thought I would just read each line, concatenating the content into an English and French variable until I reached a delimiter for the term (the Date de creation line), then dump the terms and go on to the next.
It looked fairly straightforward until I examined the file. For some reason, tabs get set for each line. The 1st term above could be parsed because each language has the same number of lines, and the carryover lines each have a single tab preceding each language's portion. However, when only one language carries over, only a single tab precedes the term: in the last 2 carryover parts of the 2nd term, only the apparent position determined by a single tab set at 4.71" shows that it is French. If an English term spilled over more lines than its French equivalent, the final carryover is preceded by a single tab set at 1.21". (The "Terme" line always has tabs at 0.17, 1.08, 1.21, 4.58 & 4.71 inches. Carryover lines with both languages have tabs at 1.21 and 4.71 inches -- but a carryover of English only has one tab at 1.21 whereas a carryover of French only has a single tab at 4.71 inches.)
For a number of arcane and bureaucratic reasons, it isn't possible to change the way the data is exported (my first thought!). Other than the position determined by the line's tab settings, there is no distinguishing characteristic for the English and French (a language attribute for example).
Is there a way I can get VBA to detect the apparent position of content? The counters in the status bar show something like "Ln 24 Col 30" but the Col is actually the count of characters, and a tab counts as one so this doesn't reveal the position where the tab is set.
I'm stumped and would appreciate any tips to get me oriented properly! (I could post a real example with the tabs instead of the mockup above if real data would help.)