PDA

View Full Version : Solved: How can I detect the position of tabbed content?



EricFletcher
11-24-2005, 12:14 PM
I have a large dump of data in RTF format (1100 pages) consisting of English terms and their French translations and need to rearrange it to a different format for use in translation support. Each term is presented side-by-side, and if one or both require more width than is available for the presentation format, it is continued on the next line(s). Each line is ended by a return, and tabs are used to position the content of the lines (Word tables are not used). Here's a mockup (please ignore the periods; I couldn't figure out how to render this so multiple spaces wouldn't convert to one, or how to set the tabs to show the layout):

.Terme.*.Guide.to.Effective.Risk.....*.Guide.de.la.gestion.du.risque
.........Management.and.Contingency....et.de.la.planification.des
.........Planning.in.Support.of........mesures.d'urgence.relativement
.........the.Year.2000.Challenge.......au.probl?me.de.l'an.2000
.Date.de.creation..1998/10/19

.Terme .*.Workplace.Safety.and........*.Commission.de.la.s?curit?
.........Insurance.Board...............professionnelle.et.de
.......................................l'assurance.contre.les
.......................................accidents.du.travail
.Date.de.creation..2000/06/28

To build the full English and French terms, I thought I would just read each line, concatenating the content into an English and French variable until I reached a delimiter for the term (the Date de creation line), then dump the terms and go on to the next.

It looked fairly straightforward until I examined the file. For some reason, tabs get set for each line. The 1st term above could be parsed because each language has the same number of lines, and the carryover lines each have a single tab preceding each language's portion. However, when only one language carries over, only a single tab precedes the term: in the last 2 carryover parts of the 2nd term, only the apparent position determined by a single tab set at 4.71" shows that it is French. If an English term spilled over more lines than its French equivalent, the final carryover is preceded by a single tab set at 1.21". (The "Terme" line always has tabs at 0.17, 1.08, 1.21, 4.58 & 4.71 inches. Carryover lines with both languages have tabs at 1.21 and 4.71 inches -- but a carryover of English only has one tab at 1.21 whereas a carryover of French only has a single tab at 4.71 inches.)

For a number of arcane and bureaucratic reasons, it isn't possible to change the way the data is exported (my first thought!). Other than the position determined by the line's tab settings, there is no distinguishing characteristic for the English and French (a language attribute for example).

Is there a way I can get VBA to detect the apparent position of content? The counters in the status bar show something like "Ln 24 Col 30" but the Col is actually the count of characters, and a tab counts as one so this doesn't reveal the position where the tab is set.

I'm stumped and would appreciate any tips to get me oriented properly! (I could post a real example with the tabs instead of the mockup above if real data would help.)

mdmackillop
11-24-2005, 04:19 PM
Hi Eric,
I think some real data would probably be of assistance. Easier to play with certainly.
Regards
Malcolm

TonyJollans
11-24-2005, 04:38 PM
Hi Eric,

I did something along these lines about six months ago for somebody else. I'll dig it out and see if I can tweak it. It'll probably be some tomorrow before I can get back, but it can certainly be done - using Selection.Information(wdHorizontalPositionRelativeToTextBoundary) to decide what lines up with what.

What I have is quite complex but if your document is as precisely laid out as you say, it shouldn't be too difficult. Can you confirm that the tabs are *always* as you state - it doesn't matter if they're not but if they are it should make it a lot easier.

TonyJollans
11-25-2005, 04:07 AM
I gave this a little thought overnight and if the tab stops are always as stated then it should be straightforward with pseudo code along these lines ..

For Each Paragraph
If Not Empty

Case First Tabstop is 0.17

If Tab followed by "Terme" Then

(New Entry) Save Previous Entry
Clear Variables
EnglishText = text between tab at 1.21 and next tab
French text = text after tab at 4.71 (if any?)


Else (must be Date de creation)

Date = whatever followsEndIfCase First tab is 1.21

If there is a tab at 4.71

English = English & text between tab at 1.21 and next tab
French = French & text after tab at 4.71Else

English = English & text between tab at 1.21 and next tabEndIfCase (only other) first tab at 4.71

French = French & text after tab at 4.71EndIfNext paragraph

fumei
11-25-2005, 12:43 PM
This is an interesting one.

Hmmmmmmmmmm.

EricFletcher
11-25-2005, 01:15 PM
Thanks for the tip about Selection.Information Tony. I was able to use it in a bit of code together to identify the accurate location for a given position.
Sub ListTheTabStop()
Dim wc
Dim tsInch As String

wc = Selection.Information(wdHorizontalPositionRelativeToTextBoundary)
tsInch = str(wc / 72)
If Len(tsInch) > 6 Then
tsInch = Left(tsInch, 6)
End If

Selection.TypeText Text:=LTrim(str(wc)) + "pt;" + tsInch + "in"

End Sub


However, I can see that the tab settings are often rounded off: in my sample data, a tab set at 0.17" in the Tabs dialog is actually 0.1625" or 11.7pt. This may not be an issue if I use the Selection.Information though, since I assume it would report the definitive value and the dialog would just be rounding up to fit the current preference settings.

Also Tony, thanks for the pseudo code outline. It is pretty much how I was planning to approach it, although I hadn't thought of using Case.

Per Malcolm, I have attached a small sample of records exported, with some notes at the end identifying the exact tab locations. As you'll see, there is actually more in each record, but for the purposes of my forum query, I didn't bother including stuff I would just ignore.

On a bit of a side track, how would I go about compensating people for assistance in coming up with a VBA solution? I do not have this as an assigned task right now, but if it looks like I can find a workable solution, I would certainly charge them for it. For other work I do, I pay people who assist me (subcontract) but I've never encountered a situation where subcontractors would be out in cyberspace somewhere! Have any of you collaborated in this way? And, if so, how do you determine a fee, and how would payment be made?

fumei
11-25-2005, 01:31 PM
Yikes Eric! Hmmmm. We are here for free, although there are places here that people artvertise as contractors. I would say that this would only be an issue if you were getting assistance - agreed upon assistance - outside the scope of the forum.

Thoughts anyone?

TonyJollans
11-27-2005, 06:24 AM
Hi Eric,

If you can trust the tabs, this will do it (I think - it's not fully tested). It should at least give you a working base to build on.

A couple of points:



It has a special check for end of data to make it work with the posted sample
It extracts almost everything it recognises from the file - but only writes out terms, abbreviations and the date.
You'll have to change the output file name, of course.
Payment is not wanted.
Option Explicit

Private Type Bilingual
English As String
French As String
End Type

Private Type TermElement
Keyword As String
Output As Boolean
Value As Bilingual
End Type

Private Type DateElement
Keyword As String
Output As Boolean
Value As String
End Type

Private Type ClientElement
Keyword As String
Output As Boolean
Value As String
End Type

Private TermElements() As TermElement
Private DateElements() As DateElement
Private ClientElements() As ClientElement

Private RecordCount As Long

Private ndxTerm As Long

Private Sub ExtractForEric()

Const RecordHeader As Currency = 1.88
Const TermLine As Currency = 0.17
Const TermContinued As Currency = 1.21
Const TermContinuedFrench As Currency = 4.71
Const ClientLine As Currency = 0.5

Dim Para As Word.Paragraph

Call InitialiseElements
RecordCount = -1

For Each Para In ActiveDocument.Paragraphs

' FOR TESTING ONLY =========================================
If Left(Para.Range.Text, 13) = "Notes for VBA" Then Exit For
' ==========================================================

If Trim(Replace(Replace(Para.Range.Text, vbTab, ""), vbCr, "")) = "" Then
' Empty paragraph - ignore
Else

' ASSUMPTION: Every (non-empty) paragraph has _
' (and starts with) a custom tab stop
Select Case Round(PointsToInches(Para.TabStops(1).Position), 2)

Case RecordHeader

Call WriteRecord

Case TermLine

Call ExtractTermLine(Para)

Case TermContinued

Call ExtractContinuedTerm(Para)

Case TermContinuedFrench

Call ExtractContinuedTermFrench(Para)

Case ClientLine

Call ExtractClientLine(Para)

Case Else

MsgBox "Tab stop at " _
& PointsToInches(Para.TabStops(1).Position) _
& """"

End Select

End If

Next

Call WriteRecord(FinalCall:=True)

End Sub

Private Sub ExtractTermLine(Para As Word.Paragraph)

' FORMATS ASSUMED:
' Tab-separated term-keyword, star, English, star, French
' Tab-separated date-keyword, date, more data

Dim Breakdown
Dim ndx As Long

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 6)

For ndx = 1 To UBound(TermElements)
If TermElements(ndx).Keyword = UCase(Breakdown(1)) Then
ndxTerm = ndx
If UBound(Breakdown) > 2 Then _
TermElements(ndx).Value.English = Breakdown(3)
If UBound(Breakdown) > 4 Then _
TermElements(ndx).Value.French = Breakdown(5)
Exit Sub
End If
Next

For ndx = 1 To UBound(DateElements)
If DateElements(ndx).Keyword = UCase(Breakdown(1)) Then
If UBound(Breakdown) > 1 Then _
DateElements(ndx).Value = Breakdown(2)
Exit Sub
End If
Next

MsgBox "Data Type not recognised:" _
& vbCr & Para.Range.Text _
& vbCr & "Ignoring line"

End Sub

Private Sub ExtractContinuedTerm(Para As Word.Paragraph)

' FORMAT ASSUMED: Tab-separated English, <French>

Dim Breakdown

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

TermElements(ndxTerm).Value.English _
= TermElements(ndxTerm).Value.English & " " & Breakdown(1)

If UBound(Breakdown) > 1 Then
TermElements(ndxTerm).Value.French _
= TermElements(ndxTerm).Value.French & " " & Breakdown(2)
End If


End Sub

Private Sub ExtractContinuedTermFrench(Para As Word.Paragraph)

' FORMAT ASSUMED: Tab, French

Dim Breakdown

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 2)

TermElements(ndxTerm).Value.French _
= TermElements(ndxTerm).Value.French & " " & Breakdown(1)


End Sub

Private Sub ExtractClientLine(Para As Word.Paragraph)

' FORMAT ASSUMED: Tab-separated keyword, value

Dim Breakdown
Dim ndx As Long

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

For ndx = 1 To UBound(ClientElements)
If ClientElements(ndx).Keyword = UCase(Breakdown(1)) Then
If UBound(Breakdown) > 1 Then _
ClientElements(ndx).Value = Breakdown(2)
Exit Sub
End If
Next

MsgBox "Data Type not recognised:" _
& vbCr & Para.Range.Text _
& vbCr & "Ignoring line"

End Sub

Private Sub WriteRecord(Optional FinalCall As Boolean)

Dim OutputString As String
Dim ndx As Long

If FinalCall Then
Close #1
Exit Sub
End If

If RecordCount = -1 Then
Open "C:\Documents and Settings\Tony\Desktop\Eric.txt" For Output As #1
RecordCount = RecordCount + 1
Exit Sub
End If

OutputString = ""

For ndx = 1 To UBound(TermElements)
With TermElements(ndx)
' OutputString = OutputString & .Keyword & vbCr
If .Output Then
If Trim(.Value.English) <> "" Then _
OutputString = OutputString & .Value.English & vbTab
If Trim(.Value.French) <> "" Then _
OutputString = OutputString & .Value.French & vbTab
End If
.Value.English = ""
.Value.French = ""
End With
Next

For ndx = 1 To UBound(DateElements)
With DateElements(ndx)
If .Output Then
OutputString = OutputString & .Value
End If
.Value = ""
End With
Next

' For ndx = 1 To UBound(ClientElements)
' With ClientElements(ndx)
' OutputString = OutputString & .Keyword & vbCr
' OutputString = OutputString & .Value & vbCr & vbCr
' .Value = ""
' End With
' Next

' MsgBox OutputString
Print #1, OutputString

RecordCount = RecordCount + 1

End Sub

Private Sub InitialiseElements()

Dim ndx As Long

ReDim TermElements(0)
ndx = UBound(TermElements)

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "TERME"
TermElements(ndx).Output = True

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "ABR?VIATIONS"
TermElements(ndx).Output = True

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "SYNONYMES"

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "CONTEXTE"

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "SOURCE"

ReDim DateElements(0)
ndx = UBound(DateElements)

ndx = ndx + 1
ReDim Preserve DateElements(ndx)
DateElements(ndx).Keyword = "DATE DE CR?ATION"
DateElements(ndx).Output = True

ReDim ClientElements(0)
ndx = UBound(ClientElements)

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "CLIENT"

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "DOMAINE"

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "PROJET"

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "AUTEUR"

End Sub


A final question for Gerry - is the line length OK in this post? I tried to keep it short for you. What resolution screen are you running with?

TonyJollans
11-27-2005, 12:05 PM
Hi Eric,

Going back and rereading your post, I think I should add a little more.

My posted code checks the tab stops that are set, which are far easier to work with than the position on the page. The position on the page is subject to what I can only call a bug in that it is sometimes affected by the zoom percentage making it difficult to check a precise position of a single element, such as a tab stop - what it can do effectively, and what I have used it for is to compare two different elements to see if they line up,

EricFletcher
11-27-2005, 05:58 PM
Wow Tony... that was a real "tour de force" on your part!

I spent quite a while studying a printout of it (warmer by the woodstove than at the computer!) to get an idea of what you were doing. Your code nicely introduces me to several VBA features that I know I will find very useful (Split, Replace) -- and tantalizes me with some I can vaguely understand but will need to study some more (the Private Type definitions, how you initialized elements)!

I tested it and managed to make a few minor changes to address a couple of glitches (it missed the last term because the "FinalCall" exit was processed before it wrote the data, and empty abbreviations dropped an output tab). I also added a message to let me know when it finished after stupidly staring at the screen and not realizing that it had finished! (Being used to working with recorded code I expected it to take longer. I note Gerry's comments elsewhere about Object model vs Selection and realize this is obviously the way to go for an exercise like this one...)

I then gave it the acid test and ran it against my 12MB RTF file with 6,676 terms: it took a bit longer, but it worked perfectly! No need for you to try testing for position on the page; the tab stop appoach worked fine.

As I made my minor changes, I added more comments (for my own edification mostly) and removed the break in several lines for readability. I was going to post the new version, but I noted your final comment about readability of the code, and wasn't sure if what I have now will be too wide for display (the longest line is now ~112 characters).

Your code appeared fully within the width of my 1024-wide window, but I have a 2-monitor setup so I can go much wider if needed. (However, since it was within a nested scrolling window in Firefox, I just copied it out into VBA to examine it in any case.) Would you like me to post the revised code? (I'll be away Monday so it might not show up until late.)

TonyJollans
11-28-2005, 03:50 AM
Hi Eric,

The FinalCall error was sloppy - sorry. I did know the logic for adding the tabs to the output file wasn't quite right but as I write this I realise that I made it overcomplicated (as you can probably tell from the commented code I accidentally left in I had tested with a Msgbox display and just added the file write at the end)

I was a bit inconsistent with module-level variables. I should either have had none or all (probably in this case it makes most sense to make Para global along with everything else). As I had several I just made everything Private - it just limits the scope to the module rather than the otherwise default of project. Split and Replace, for anybody else reading this, are very useful but were new in 2000 (so won' work in 97). My array initializations are, perhaps, a slight abuse in that I create, but don't use, a zeroth element - for it to work as is it requires an implicit (default) or explicit "Option Base 0".

I rarely have problems with the display myself (1268 wide) but Gerry has complained before (and I'm sure others suffer in silence) so I deliberately split some lines to keep them short; I just wondered how successful I'd been.

Yes, it would be good to post corrected/amended code for those who come along later. In your own time, of course.

fumei
11-28-2005, 07:23 AM
I am running 1024 and no, the code does not fit for me. Almost, but not quite. The line Trim (Replace (Replace..... does not make it.

Is there any way to decrease the tab stops within the VBA window? Because they sure do take up a chunk of real estate.

Thanks Tony, I do appreciate the effort. Yeah, I know I am a complainer.....

Eric, I find it interesting that you removed breaks. I find breaks make code easier to read. Not really here, as you can't get the tab stops right, but in the VBE I find it much better. But then, I tend to not use maximized code windows.

Back to Tony - wow, you put some work into that puppy. I am going to pull it out and really look at it.

TonyJollans
11-28-2005, 07:33 AM
Thanks, Gerry, that gives me a guide.

I did look at that line and decided breaking it would be more confusing than not.

No, I don't believe we have any control over the formatting of VBA tagged code - except maybe font size, I haven't tried that and am not sure I would want to make it smaller anyway.

EricFletcher
11-28-2005, 03:50 PM
Freezing rain here this morning so I'm GLAD to be back early to put this up! Here is my slightly modified version of the VBA Tony posted. As you'll see, some lines extend beyond the VBA window because I removed the breaks. Sorry Gerry... I find code with breaks harder to follow -- possibly because I'm less used to the syntax and don't recognize the broken items as single statements! As well, I tend to use Ctrl-scrollwheel to zoom the font size out so I can see the full width and navigate within it, then zoom back in to focus on details. Alternatively, I'll extend the window to another monitor and get twice the width to work with.

I've also added a lot more comments; mostly for my own reference as I worked through understanding Tony's elegant code, but also to alert myself to things I need to follow up on. I thought I may as well leave them in here... (Comments starting with '-- are mine: an old habit from PL/1 programming I think!)

If anyone has further ideas about where I might look for learning more about items I've flagged, I'd love to hear more.

And now, the code:


Option Explicit

'-- Investigate this: why is this preferred over other methods of declaring variables?
Private Type Bilingual
English As String
French As String
End Type

Private Type TermElement
Keyword As String
Output As Boolean
Value As Bilingual
End Type

Private Type DateElement
Keyword As String
Output As Boolean
Value As String
End Type

Private Type ClientElement
Keyword As String
Output As Boolean
Value As String
End Type

Private TermElements() As TermElement
Private DateElements() As DateElement
Private ClientElements() As ClientElement

Private RecordCount As Long

Private ndxTerm As Long

Private Sub ExtractForEric()

'-- These are the tab stops set for the various lines in the term database
Const RecordHeader As Currency = 1.88 '-- Anglais (start of a new term defn)
Const TermLine As Currency = 0.17 '-- subcomponent names
Const TermContinued As Currency = 1.21 '-- continuation of an English subcomponent
Const TermContinuedFrench As Currency = 4.71 '-- French continuation
Const ClientLine As Currency = 0.5 '-- start posn for Client etc (not needed for this)

Dim Para As Word.Paragraph

Call InitialiseElements
RecordCount = -1

For Each Para In ActiveDocument.Paragraphs

' FOR TESTING ONLY (Uses the sample file I posted to VBAX)==
If Left(Para.Range.Text, 13) = "Notes for VBA" Then Exit For
' ==========================================================

If Trim(Replace(Replace(Para.Range.Text, vbTab, ""), vbCr, "")) = "" Then
' Empty paragraph - ignore
Else

' ASSUMPTION: Every (non-empty) paragraph has (and starts with) a custom tab stop
Select Case Round(PointsToInches(Para.TabStops(1).Position), 2)
'-- Brilliant! The value for case is determined by the 1st tab stop defined.

Case RecordHeader '-- new term definition (starts with Anglais at tab 1.88")
Call WriteRecord '-- writes out collected data (from previous term)

Case TermLine '-- parses English and/or French from line using the keyword intab 0.17"
Call ExtractTermLine(Para)

Case TermContinued '-- tab starting at 1.21" is English continuation & may have French
Call ExtractContinuedTerm(Para)

Case TermContinuedFrench '-- tab at 4.71" is always just a French continuation
Call ExtractContinuedTermFrench(Para)

Case ClientLine '-- lines starting at 0.5" have data not needed for now
Call ExtractClientLine(Para) '-- but this would enable the data to be extracted

Case Else '-- Report an unrecognized tab stop (and ignore it)
MsgBox "Tab stop at " & PointsToInches(Para.TabStops(1).Position) & """"

End Select
End If
Next

Call WriteRecord(FinalCall:=True) '-- the final record won't have a new term to start it so write the data

End Sub

Private Sub ExtractTermLine(Para As Word.Paragraph)
' FORMATS ASSUMED:
' Tab-separated term-keyword, star, English, star, French
' Tab-separated date-keyword, date, more data
'-- This handles the term's main named sub-components (but not the Client etc. parts at end)

Dim Breakdown
Dim ndx As Long

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 6)
'-- Nice! Breakdown will now be an array of the substrings that were separated by a tab within
'-- the line. The sub-component name will be #1; #3 will be English; #4, French if non-blank.

For ndx = 1 To UBound(TermElements) '-- this parses out the named sub-components (except Date)
If TermElements(ndx).Keyword = UCase(Breakdown(1)) Then
ndxTerm = ndx
If UBound(Breakdown) > 2 Then TermElements(ndx).Value.English = Breakdown(3)
If UBound(Breakdown) > 4 Then TermElements(ndx).Value.French = Breakdown(5)
Exit Sub
End If
Next

For ndx = 1 To UBound(DateElements) '-- this parses out the date
If DateElements(ndx).Keyword = UCase(Breakdown(1)) Then
If UBound(Breakdown) > 1 Then DateElements(ndx).Value = Breakdown(2)
Exit Sub
End If
Next

'-- If there isn't a match on the sub-component name, it reports the line & ignores it
MsgBox "Data Type not recognised:" & vbCr & Para.Range.Text & vbCr & "Ignoring line"

End Sub

Private Sub ExtractContinuedTerm(Para As Word.Paragraph)
' FORMAT ASSUMED: Tab-separated English, <French>

Dim Breakdown

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

'-- If there is a tab at 1.21", there will be at least English
TermElements(ndxTerm).Value.English = TermElements(ndxTerm).Value.English & " " & Breakdown(1)

If UBound(Breakdown) > 1 Then '-- & if there was a 2nd tab, there was French too
TermElements(ndxTerm).Value.French = TermElements(ndxTerm).Value.French & " " & Breakdown(2)
End If

End Sub

Private Sub ExtractContinuedTermFrench(Para As Word.Paragraph)
' FORMAT ASSUMED: Tab, French
'-- If the line only had French, it would always be at tab 4.71"

Dim Breakdown
Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 2)
TermElements(ndxTerm).Value.French = TermElements(ndxTerm).Value.French & " " & Breakdown(1)

End Sub

Private Sub ExtractClientLine(Para As Word.Paragraph)
' FORMAT ASSUMED: Tab-separated keyword, value
'-- These are currently ignored but this code would parse the Client etc. sub-components
'-- (but ".Output=True" would need to be set in the InitialiseElements function for each)

Dim Breakdown
Dim ndx As Long

Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

For ndx = 1 To UBound(ClientElements)
If ClientElements(ndx).Keyword = UCase(Breakdown(1)) Then
If UBound(Breakdown) > 1 Then ClientElements(ndx).Value = Breakdown(2)
Exit Sub
End If
Next
'-- Put up message if the keyword is not recognized & ignore line
MsgBox "Data Type not recognised:" & vbCr & Para.Range.Text & vbCr & "Ignoring line"

End Sub

Private Sub WriteRecord(Optional FinalCall As Boolean)
'-- Writes out the data collected for a term

Dim OutputString As String
Dim ndx As Long

If RecordCount = -1 Then
'-- On 1st entry, open the output file (***WORK OUT HOW TO SPECIFY THIS WITH A DIALOG***)
Open "C:\Documents and Settings\Tony\Desktop\Eric.txt" For Output As #1
RecordCount = RecordCount + 1
Exit Sub
End If

OutputString = ""

For ndx = 1 To UBound(TermElements) '-- add each of the E&F sub-components to the output string
With TermElements(ndx)
' OutputString = OutputString & .Keyword & vbCr
If .Output Then
'-- This next code adds the E&F components but when there is no abbreviation,
'-- it didn't add the tab so I've commented it out to see if I can tweak it
' If Trim(.Value.English) <> "" Then _
' OutputString = OutputString & .Value.English & vbTab
' If Trim(.Value.French) <> "" Then _
' OutputString = OutputString & .Value.French & vbTab
'-- This will now put out just a tab if a value had not been detected for a term
'-- (*** There may be a more elegant way to do this***)
If Trim(.Value.English) <> "" Then
OutputString = OutputString & .Value.English & vbTab
Else
OutputString = OutputString & vbTab
End If
If Trim(.Value.French) <> "" Then
OutputString = OutputString & .Value.French & vbTab
Else
OutputString = OutputString & vbTab
End If

End If
.Value.English = ""
.Value.French = ""
End With
Next

For ndx = 1 To UBound(DateElements) '-- add the date to the output string
With DateElements(ndx)
If .Output Then
OutputString = OutputString & .Value
End If
.Value = ""
End With
Next


Print #1, OutputString '-- write the output string to the file

RecordCount = RecordCount + 1

'-- This closes the output file after the final record data was written out. FinalCall gets
'-- set at the end of the main routine when no additional term was found
If FinalCall Then
Close #1
MsgBox "Completed with" & str(RecordCount) & " records processed"
Exit Sub
End If

End Sub

Private Sub InitialiseElements()

Dim ndx As Long

'-- This defines an array for each of the term subcomponents using the name that precedes
'-- each one. (The name is in LC but TJ has probably set them in UC here so he can use UCase
'-- to make them consistent if some were entered with different case.)
'-- The ".Output" item is set True if that element is needed in the output string: interesting
'-- idea and *** learn more about this!

ReDim TermElements(0)
ndx = UBound(TermElements)

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "TERME"
TermElements(ndx).Output = True

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "ABR?VIATIONS"
TermElements(ndx).Output = True

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "SYNONYMES"

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "CONTEXTE"

ndx = ndx + 1
ReDim Preserve TermElements(ndx)
TermElements(ndx).Keyword = "SOURCE"

ReDim DateElements(0)
ndx = UBound(DateElements)

ndx = ndx + 1
ReDim Preserve DateElements(ndx)
DateElements(ndx).Keyword = "DATE DE CR?ATION"
DateElements(ndx).Output = True

ReDim ClientElements(0)
ndx = UBound(ClientElements)

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "CLIENT"

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "DOMAINE"

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "PROJET"

ndx = ndx + 1
ReDim Preserve ClientElements(ndx)
ClientElements(ndx).Keyword = "AUTEUR"

End Sub


By the way Tony, I tested this with some extra lines with invalid names and non-standard tabs and it caught them (and ignored them) as expected. Running this on my 12MB RTF file with 6,767 definitions took only a couple of minutes and processed every record with only one alert on the first page title (the only non-standard line). Very nice... I suspect my original approach of modifying recorded code (Selection method?) would have ground away for hours!

Thanks for the help; I've learned a lot from this!

TonyJollans
11-28-2005, 05:16 PM
Hi Eric,

Glad it all (almost) worked.

One quick note. The mistake I made with the missing output tabs ..

If .Output Then
'-- This next code adds the E&F components but when there is no abbreviation,
'-- it didn't add the tab so I've commented it out to see if I can tweak it
' If Trim(.Value.English) <> "" Then _
' OutputString = OutputString & .Value.English & vbTab
' If Trim(.Value.French) <> "" Then _
' OutputString = OutputString & .Value.French & vbTab
'-- This will now put out just a tab if a value had not been detected for a term
'-- (*** There may be a more elegant way to do this***)
If Trim(.Value.English) <> "" Then
OutputString = OutputString & .Value.English & vbTab
Else
OutputString = OutputString & vbTab
End If
If Trim(.Value.French) <> "" Then
OutputString = OutputString & .Value.French & vbTab
Else
OutputString = OutputString & vbTab
End If
End If

Can be more simply done just with ..If .Output Then
'-- This next code adds the E&F components but when there is no abbreviation,
'-- it didn't add the tab so I've commented it out to see if I can tweak it
' If Trim(.Value.English) <> "" Then _
' OutputString = OutputString & .Value.English & vbTab
' If Trim(.Value.French) <> "" Then _
' OutputString = OutputString & .Value.French & vbTab
'-- This will now put out just a tab if a value had not been detected for a term
'-- (*** There may be a more elegant way to do this***)
OutputString = OutputString & .Value.English & vbTab
OutputString = OutputString & .Value.French & vbTab
End If

When .Value.language is empty, it simply writes a zero length string (i.e. nothing) as output followed by the tab.

I actually did this to start with and then changed my mind - I'm not sure why, now, when I think about it!

EricFletcher
11-29-2005, 07:16 AM
That was what I tried first Tony, but then it leaves a single space in the tab position when an abbreviation is not there. The test for non-blank ovecromes that.

The other thing I noticed while doing this exercise was how the database uses the foot symbol (') instead of an apostrophe (i.e. can't, n'est instead of can?t, n?est). I run into this sort of thing a lot, and usually deal with it using Find & Replace with "smart quotes" set to on. However, is this the sort of thing that the Replace function could handle? And would it work for an entire object (like the current document) or is there a limit to the string's length?

TonyJollans
11-29-2005, 07:52 AM
Hi Eric,

Try ...OutputString = OutputString & Trim(.Value.English) & vbTab
OutputString = OutputString & Trim(.Value.French) & vbTab

I don't think there's any special limit on the length of string Replace can handle so it should work on Document.Content.Text - and yes, it could replace quotes with smart quotes - but smart quotes are language-dependent so you might need to do some extra checks if changing, say, double quotes to double chevrons in French text.

EricFletcher
11-29-2005, 11:05 AM
Thanks Tony. After I sent that post I tried it (similar to what you just posted here) and it works fine for the apostrophes. You're right of course about other kinds of quotes needing >1 character (i.e. "this" becomes ? this ? in French). Your use of the Replace function was a revelation to me and is prompting me to consider how I can use it elsewhere!