Consulting

Results 1 to 18 of 18

Thread: Solved: How can I detect the position of tabbed content?

  1. #1
    VBAX Regular
    Joined
    Aug 2004
    Location
    On a 100 acre hobby farm in beautiful west Quebec.
    Posts
    87
    Location

    Solved: How can I detect the position of tabbed content?

    I have a large dump of data in RTF format (1100 pages) consisting of English terms and their French translations and need to rearrange it to a different format for use in translation support. Each term is presented side-by-side, and if one or both require more width than is available for the presentation format, it is continued on the next line(s). Each line is ended by a return, and tabs are used to position the content of the lines (Word tables are not used). Here's a mockup (please ignore the periods; I couldn't figure out how to render this so multiple spaces wouldn't convert to one, or how to set the tabs to show the layout):

    .Terme.*.Guide.to.Effective.Risk.....*.Guide.de.la.gestion.du.risque
    .........Management.and.Contingency....et.de.la.planification.des
    .........Planning.in.Support.of........mesures.d'urgence.relativement
    .........the.Year.2000.Challenge.......au.probl?me.de.l'an.2000
    .Date.de.creation..1998/10/19

    .Terme
    .*.Workplace.Safety.and........*.Commission.de.la.s?curit?
    .........Insurance.Board...............professionnelle.et.de
    .......................................l'assurance.contre.les
    .......................................accidents.du.travail
    .Date.de.creation..2000/06/28

    To build the full English and French terms, I thought I would just read each line, concatenating the content into an English and French variable until I reached a delimiter for the term (the Date de creation line), then dump the terms and go on to the next.

    It looked fairly straightforward until I examined the file. For some reason, tabs get set for each line. The 1st term above could be parsed because each language has the same number of lines, and the carryover lines each have a single tab preceding each language's portion. However, when only one language carries over, only a single tab precedes the term: in the last 2 carryover parts of the 2nd term, only the apparent position determined by a single tab set at 4.71" shows that it is French. If an English term spilled over more lines than its French equivalent, the final carryover is preceded by a single tab set at 1.21". (The "Terme" line always has tabs at 0.17, 1.08, 1.21, 4.58 & 4.71 inches. Carryover lines with both languages have tabs at 1.21 and 4.71 inches -- but a carryover of English only has one tab at 1.21 whereas a carryover of French only has a single tab at 4.71 inches.)

    For a number of arcane and bureaucratic reasons, it isn't possible to change the way the data is exported (my first thought!). Other than the position determined by the line's tab settings, there is no distinguishing characteristic for the English and French (a language attribute for example).

    Is there a way I can get VBA to detect the apparent position of content? The counters in the status bar show something like "Ln 24 Col 30" but the Col is actually the count of characters, and a tab counts as one so this doesn't reveal the position where the tab is set.

    I'm stumped and would appreciate any tips to get me oriented properly! (I could post a real example with the tabs instead of the mockup above if real data would help.)
    Eric

    Experience is not what happens to a man; it is what a man does with what happens to him. ? Aldous Huxley

  2. #2
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Hi Eric,
    I think some real data would probably be of assistance. Easier to play with certainly.
    Regards
    Malcolm
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  3. #3
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Eric,

    I did something along these lines about six months ago for somebody else. I'll dig it out and see if I can tweak it. It'll probably be some tomorrow before I can get back, but it can certainly be done - using Selection.Information(wdHorizontalPositionRelativeToTextBoundary) to decide what lines up with what.

    What I have is quite complex but if your document is as precisely laid out as you say, it shouldn't be too difficult. Can you confirm that the tabs are *always* as you state - it doesn't matter if they're not but if they are it should make it a lot easier.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  4. #4
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    I gave this a little thought overnight and if the tab stops are always as stated then it should be straightforward with pseudo code along these lines ..

    For Each Paragraph
    If Not Empty

    Case First Tabstop is 0.17

    If Tab followed by "Terme" Then

    (New Entry) Save Previous Entry
    Clear Variables
    EnglishText = text between tab at 1.21 and next tab
    French text = text after tab at 4.71 (if any?)
    Else (must be Date de creation)

    Date = whatever follows
    EndIf
    Case First tab is 1.21

    If there is a tab at 4.71

    English = English & text between tab at 1.21 and next tab
    French = French & text after tab at 4.71
    Else

    English = English & text between tab at 1.21 and next tab
    EndIf
    Case (only other) first tab at 4.71

    French = French & text after tab at 4.71
    EndIf
    Next paragraph
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  5. #5
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    This is an interesting one.

    Hmmmmmmmmmm.

  6. #6
    VBAX Regular
    Joined
    Aug 2004
    Location
    On a 100 acre hobby farm in beautiful west Quebec.
    Posts
    87
    Location
    Thanks for the tip about Selection.Information Tony. I was able to use it in a bit of code together to identify the accurate location for a given position.
    [VBA]Sub ListTheTabStop()
    Dim wc
    Dim tsInch As String

    wc = Selection.Information(wdHorizontalPositionRelativeToTextBoundary)
    tsInch = str(wc / 72)
    If Len(tsInch) > 6 Then
    tsInch = Left(tsInch, 6)
    End If

    Selection.TypeText Text:=LTrim(str(wc)) + "pt;" + tsInch + "in"

    End Sub
    [/VBA]

    However, I can see that the tab settings are often rounded off: in my sample data, a tab set at 0.17" in the Tabs dialog is actually 0.1625" or 11.7pt. This may not be an issue if I use the Selection.Information though, since I assume it would report the definitive value and the dialog would just be rounding up to fit the current preference settings.

    Also Tony, thanks for the pseudo code outline. It is pretty much how I was planning to approach it, although I hadn't thought of using Case.

    Per Malcolm, I have attached a small sample of records exported, with some notes at the end identifying the exact tab locations. As you'll see, there is actually more in each record, but for the purposes of my forum query, I didn't bother including stuff I would just ignore.

    On a bit of a side track, how would I go about compensating people for assistance in coming up with a VBA solution? I do not have this as an assigned task right now, but if it looks like I can find a workable solution, I would certainly charge them for it. For other work I do, I pay people who assist me (subcontract) but I've never encountered a situation where subcontractors would be out in cyberspace somewhere! Have any of you collaborated in this way? And, if so, how do you determine a fee, and how would payment be made?
    Eric

    Experience is not what happens to a man; it is what a man does with what happens to him. ? Aldous Huxley

  7. #7
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    Yikes Eric! Hmmmm. We are here for free, although there are places here that people artvertise as contractors. I would say that this would only be an issue if you were getting assistance - agreed upon assistance - outside the scope of the forum.

    Thoughts anyone?

  8. #8
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Eric,

    If you can trust the tabs, this will do it (I think - it's not fully tested). It should at least give you a working base to build on.

    A couple of points:


    • It has a special check for end of data to make it work with the posted sample
    • It extracts almost everything it recognises from the file - but only writes out terms, abbreviations and the date.
    • You'll have to change the output file name, of course.
    • Payment is not wanted.
    [vba]Option Explicit

    Private Type Bilingual
    English As String
    French As String
    End Type

    Private Type TermElement
    Keyword As String
    Output As Boolean
    Value As Bilingual
    End Type

    Private Type DateElement
    Keyword As String
    Output As Boolean
    Value As String
    End Type

    Private Type ClientElement
    Keyword As String
    Output As Boolean
    Value As String
    End Type

    Private TermElements() As TermElement
    Private DateElements() As DateElement
    Private ClientElements() As ClientElement

    Private RecordCount As Long

    Private ndxTerm As Long

    Private Sub ExtractForEric()

    Const RecordHeader As Currency = 1.88
    Const TermLine As Currency = 0.17
    Const TermContinued As Currency = 1.21
    Const TermContinuedFrench As Currency = 4.71
    Const ClientLine As Currency = 0.5

    Dim Para As Word.Paragraph

    Call InitialiseElements
    RecordCount = -1

    For Each Para In ActiveDocument.Paragraphs

    ' FOR TESTING ONLY =========================================
    If Left(Para.Range.Text, 13) = "Notes for VBA" Then Exit For
    ' ==========================================================

    If Trim(Replace(Replace(Para.Range.Text, vbTab, ""), vbCr, "")) = "" Then
    ' Empty paragraph - ignore
    Else

    ' ASSUMPTION: Every (non-empty) paragraph has _
    ' (and starts with) a custom tab stop
    Select Case Round(PointsToInches(Para.TabStops(1).Position), 2)

    Case RecordHeader

    Call WriteRecord

    Case TermLine

    Call ExtractTermLine(Para)

    Case TermContinued

    Call ExtractContinuedTerm(Para)

    Case TermContinuedFrench

    Call ExtractContinuedTermFrench(Para)

    Case ClientLine

    Call ExtractClientLine(Para)

    Case Else

    MsgBox "Tab stop at " _
    & PointsToInches(Para.TabStops(1).Position) _
    & """"

    End Select

    End If

    Next

    Call WriteRecord(FinalCall:=True)

    End Sub

    Private Sub ExtractTermLine(Para As Word.Paragraph)

    ' FORMATS ASSUMED:
    ' Tab-separated term-keyword, star, English, star, French
    ' Tab-separated date-keyword, date, more data

    Dim Breakdown
    Dim ndx As Long

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 6)

    For ndx = 1 To UBound(TermElements)
    If TermElements(ndx).Keyword = UCase(Breakdown(1)) Then
    ndxTerm = ndx
    If UBound(Breakdown) > 2 Then _
    TermElements(ndx).Value.English = Breakdown(3)
    If UBound(Breakdown) > 4 Then _
    TermElements(ndx).Value.French = Breakdown(5)
    Exit Sub
    End If
    Next

    For ndx = 1 To UBound(DateElements)
    If DateElements(ndx).Keyword = UCase(Breakdown(1)) Then
    If UBound(Breakdown) > 1 Then _
    DateElements(ndx).Value = Breakdown(2)
    Exit Sub
    End If
    Next

    MsgBox "Data Type not recognised:" _
    & vbCr & Para.Range.Text _
    & vbCr & "Ignoring line"

    End Sub

    Private Sub ExtractContinuedTerm(Para As Word.Paragraph)

    ' FORMAT ASSUMED: Tab-separated English, <French>

    Dim Breakdown

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

    TermElements(ndxTerm).Value.English _
    = TermElements(ndxTerm).Value.English & " " & Breakdown(1)

    If UBound(Breakdown) > 1 Then
    TermElements(ndxTerm).Value.French _
    = TermElements(ndxTerm).Value.French & " " & Breakdown(2)
    End If


    End Sub

    Private Sub ExtractContinuedTermFrench(Para As Word.Paragraph)

    ' FORMAT ASSUMED: Tab, French

    Dim Breakdown

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 2)

    TermElements(ndxTerm).Value.French _
    = TermElements(ndxTerm).Value.French & " " & Breakdown(1)


    End Sub

    Private Sub ExtractClientLine(Para As Word.Paragraph)

    ' FORMAT ASSUMED: Tab-separated keyword, value

    Dim Breakdown
    Dim ndx As Long

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

    For ndx = 1 To UBound(ClientElements)
    If ClientElements(ndx).Keyword = UCase(Breakdown(1)) Then
    If UBound(Breakdown) > 1 Then _
    ClientElements(ndx).Value = Breakdown(2)
    Exit Sub
    End If
    Next

    MsgBox "Data Type not recognised:" _
    & vbCr & Para.Range.Text _
    & vbCr & "Ignoring line"

    End Sub

    Private Sub WriteRecord(Optional FinalCall As Boolean)

    Dim OutputString As String
    Dim ndx As Long

    If FinalCall Then
    Close #1
    Exit Sub
    End If

    If RecordCount = -1 Then
    Open "C:\Documents and Settings\Tony\Desktop\Eric.txt" For Output As #1
    RecordCount = RecordCount + 1
    Exit Sub
    End If

    OutputString = ""

    For ndx = 1 To UBound(TermElements)
    With TermElements(ndx)
    ' OutputString = OutputString & .Keyword & vbCr
    If .Output Then
    If Trim(.Value.English) <> "" Then _
    OutputString = OutputString & .Value.English & vbTab
    If Trim(.Value.French) <> "" Then _
    OutputString = OutputString & .Value.French & vbTab
    End If
    .Value.English = ""
    .Value.French = ""
    End With
    Next

    For ndx = 1 To UBound(DateElements)
    With DateElements(ndx)
    If .Output Then
    OutputString = OutputString & .Value
    End If
    .Value = ""
    End With
    Next

    ' For ndx = 1 To UBound(ClientElements)
    ' With ClientElements(ndx)
    ' OutputString = OutputString & .Keyword & vbCr
    ' OutputString = OutputString & .Value & vbCr & vbCr
    ' .Value = ""
    ' End With
    ' Next

    ' MsgBox OutputString
    Print #1, OutputString

    RecordCount = RecordCount + 1

    End Sub

    Private Sub InitialiseElements()

    Dim ndx As Long

    ReDim TermElements(0)
    ndx = UBound(TermElements)

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "TERME"
    TermElements(ndx).Output = True

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "ABR?VIATIONS"
    TermElements(ndx).Output = True

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "SYNONYMES"

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "CONTEXTE"

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "SOURCE"

    ReDim DateElements(0)
    ndx = UBound(DateElements)

    ndx = ndx + 1
    ReDim Preserve DateElements(ndx)
    DateElements(ndx).Keyword = "DATE DE CR?ATION"
    DateElements(ndx).Output = True

    ReDim ClientElements(0)
    ndx = UBound(ClientElements)

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "CLIENT"

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "DOMAINE"

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "PROJET"

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "AUTEUR"

    End Sub
    [/vba]

    A final question for Gerry - is the line length OK in this post? I tried to keep it short for you. What resolution screen are you running with?
    Last edited by TonyJollans; 11-27-2005 at 12:09 PM. Reason: Noticed the VBA formatter doesn't cope well with continued comments
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  9. #9
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Eric,

    Going back and rereading your post, I think I should add a little more.

    My posted code checks the tab stops that are set, which are far easier to work with than the position on the page. The position on the page is subject to what I can only call a bug in that it is sometimes affected by the zoom percentage making it difficult to check a precise position of a single element, such as a tab stop - what it can do effectively, and what I have used it for is to compare two different elements to see if they line up,
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  10. #10
    VBAX Regular
    Joined
    Aug 2004
    Location
    On a 100 acre hobby farm in beautiful west Quebec.
    Posts
    87
    Location
    Wow Tony... that was a real "tour de force" on your part!

    I spent quite a while studying a printout of it (warmer by the woodstove than at the computer!) to get an idea of what you were doing. Your code nicely introduces me to several VBA features that I know I will find very useful (Split, Replace) -- and tantalizes me with some I can vaguely understand but will need to study some more (the Private Type definitions, how you initialized elements)!

    I tested it and managed to make a few minor changes to address a couple of glitches (it missed the last term because the "FinalCall" exit was processed before it wrote the data, and empty abbreviations dropped an output tab). I also added a message to let me know when it finished after stupidly staring at the screen and not realizing that it had finished! (Being used to working with recorded code I expected it to take longer. I note Gerry's comments elsewhere about Object model vs Selection and realize this is obviously the way to go for an exercise like this one...)

    I then gave it the acid test and ran it against my 12MB RTF file with 6,676 terms: it took a bit longer, but it worked perfectly! No need for you to try testing for position on the page; the tab stop appoach worked fine.

    As I made my minor changes, I added more comments (for my own edification mostly) and removed the break in several lines for readability. I was going to post the new version, but I noted your final comment about readability of the code, and wasn't sure if what I have now will be too wide for display (the longest line is now ~112 characters).

    Your code appeared fully within the width of my 1024-wide window, but I have a 2-monitor setup so I can go much wider if needed. (However, since it was within a nested scrolling window in Firefox, I just copied it out into VBA to examine it in any case.) Would you like me to post the revised code? (I'll be away Monday so it might not show up until late.)
    Eric

    Experience is not what happens to a man; it is what a man does with what happens to him. ? Aldous Huxley

  11. #11
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Eric,

    The FinalCall error was sloppy - sorry. I did know the logic for adding the tabs to the output file wasn't quite right but as I write this I realise that I made it overcomplicated (as you can probably tell from the commented code I accidentally left in I had tested with a Msgbox display and just added the file write at the end)

    I was a bit inconsistent with module-level variables. I should either have had none or all (probably in this case it makes most sense to make Para global along with everything else). As I had several I just made everything Private - it just limits the scope to the module rather than the otherwise default of project. Split and Replace, for anybody else reading this, are very useful but were new in 2000 (so won' work in 97). My array initializations are, perhaps, a slight abuse in that I create, but don't use, a zeroth element - for it to work as is it requires an implicit (default) or explicit "Option Base 0".

    I rarely have problems with the display myself (1268 wide) but Gerry has complained before (and I'm sure others suffer in silence) so I deliberately split some lines to keep them short; I just wondered how successful I'd been.

    Yes, it would be good to post corrected/amended code for those who come along later. In your own time, of course.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  12. #12
    VBAX Wizard
    Joined
    May 2004
    Posts
    6,713
    Location
    I am running 1024 and no, the code does not fit for me. Almost, but not quite. The line Trim (Replace (Replace..... does not make it.

    Is there any way to decrease the tab stops within the VBA window? Because they sure do take up a chunk of real estate.

    Thanks Tony, I do appreciate the effort. Yeah, I know I am a complainer.....

    Eric, I find it interesting that you removed breaks. I find breaks make code easier to read. Not really here, as you can't get the tab stops right, but in the VBE I find it much better. But then, I tend to not use maximized code windows.

    Back to Tony - wow, you put some work into that puppy. I am going to pull it out and really look at it.

  13. #13
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Thanks, Gerry, that gives me a guide.

    I did look at that line and decided breaking it would be more confusing than not.

    No, I don't believe we have any control over the formatting of VBA tagged code - except maybe font size, I haven't tried that and am not sure I would want to make it smaller anyway.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  14. #14
    VBAX Regular
    Joined
    Aug 2004
    Location
    On a 100 acre hobby farm in beautiful west Quebec.
    Posts
    87
    Location
    Freezing rain here this morning so I'm GLAD to be back early to put this up! Here is my slightly modified version of the VBA Tony posted. As you'll see, some lines extend beyond the VBA window because I removed the breaks. Sorry Gerry... I find code with breaks harder to follow -- possibly because I'm less used to the syntax and don't recognize the broken items as single statements! As well, I tend to use Ctrl-scrollwheel to zoom the font size out so I can see the full width and navigate within it, then zoom back in to focus on details. Alternatively, I'll extend the window to another monitor and get twice the width to work with.

    I've also added a lot more comments; mostly for my own reference as I worked through understanding Tony's elegant code, but also to alert myself to things I need to follow up on. I thought I may as well leave them in here... (Comments starting with '-- are mine: an old habit from PL/1 programming I think!)

    If anyone has further ideas about where I might look for learning more about items I've flagged, I'd love to hear more.

    And now, the code:

    [VBA]
    Option Explicit

    '-- Investigate this: why is this preferred over other methods of declaring variables?
    Private Type Bilingual
    English As String
    French As String
    End Type

    Private Type TermElement
    Keyword As String
    Output As Boolean
    Value As Bilingual
    End Type

    Private Type DateElement
    Keyword As String
    Output As Boolean
    Value As String
    End Type

    Private Type ClientElement
    Keyword As String
    Output As Boolean
    Value As String
    End Type

    Private TermElements() As TermElement
    Private DateElements() As DateElement
    Private ClientElements() As ClientElement

    Private RecordCount As Long

    Private ndxTerm As Long

    Private Sub ExtractForEric()

    '-- These are the tab stops set for the various lines in the term database
    Const RecordHeader As Currency = 1.88 '-- Anglais (start of a new term defn)
    Const TermLine As Currency = 0.17 '-- subcomponent names
    Const TermContinued As Currency = 1.21 '-- continuation of an English subcomponent
    Const TermContinuedFrench As Currency = 4.71 '-- French continuation
    Const ClientLine As Currency = 0.5 '-- start posn for Client etc (not needed for this)

    Dim Para As Word.Paragraph

    Call InitialiseElements
    RecordCount = -1

    For Each Para In ActiveDocument.Paragraphs

    ' FOR TESTING ONLY (Uses the sample file I posted to VBAX)==
    If Left(Para.Range.Text, 13) = "Notes for VBA" Then Exit For
    ' ==========================================================

    If Trim(Replace(Replace(Para.Range.Text, vbTab, ""), vbCr, "")) = "" Then
    ' Empty paragraph - ignore
    Else

    ' ASSUMPTION: Every (non-empty) paragraph has (and starts with) a custom tab stop
    Select Case Round(PointsToInches(Para.TabStops(1).Position), 2)
    '-- Brilliant! The value for case is determined by the 1st tab stop defined.

    Case RecordHeader '-- new term definition (starts with Anglais at tab 1.88")
    Call WriteRecord '-- writes out collected data (from previous term)

    Case TermLine '-- parses English and/or French from line using the keyword intab 0.17"
    Call ExtractTermLine(Para)

    Case TermContinued '-- tab starting at 1.21" is English continuation & may have French
    Call ExtractContinuedTerm(Para)

    Case TermContinuedFrench '-- tab at 4.71" is always just a French continuation
    Call ExtractContinuedTermFrench(Para)

    Case ClientLine '-- lines starting at 0.5" have data not needed for now
    Call ExtractClientLine(Para) '-- but this would enable the data to be extracted

    Case Else '-- Report an unrecognized tab stop (and ignore it)
    MsgBox "Tab stop at " & PointsToInches(Para.TabStops(1).Position) & """"

    End Select
    End If
    Next

    Call WriteRecord(FinalCall:=True) '-- the final record won't have a new term to start it so write the data

    End Sub

    Private Sub ExtractTermLine(Para As Word.Paragraph)
    ' FORMATS ASSUMED:
    ' Tab-separated term-keyword, star, English, star, French
    ' Tab-separated date-keyword, date, more data
    '-- This handles the term's main named sub-components (but not the Client etc. parts at end)

    Dim Breakdown
    Dim ndx As Long

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 6)
    '-- Nice! Breakdown will now be an array of the substrings that were separated by a tab within
    '-- the line. The sub-component name will be #1; #3 will be English; #4, French if non-blank.

    For ndx = 1 To UBound(TermElements) '-- this parses out the named sub-components (except Date)
    If TermElements(ndx).Keyword = UCase(Breakdown(1)) Then
    ndxTerm = ndx
    If UBound(Breakdown) > 2 Then TermElements(ndx).Value.English = Breakdown(3)
    If UBound(Breakdown) > 4 Then TermElements(ndx).Value.French = Breakdown(5)
    Exit Sub
    End If
    Next

    For ndx = 1 To UBound(DateElements) '-- this parses out the date
    If DateElements(ndx).Keyword = UCase(Breakdown(1)) Then
    If UBound(Breakdown) > 1 Then DateElements(ndx).Value = Breakdown(2)
    Exit Sub
    End If
    Next

    '-- If there isn't a match on the sub-component name, it reports the line & ignores it
    MsgBox "Data Type not recognised:" & vbCr & Para.Range.Text & vbCr & "Ignoring line"

    End Sub

    Private Sub ExtractContinuedTerm(Para As Word.Paragraph)
    ' FORMAT ASSUMED: Tab-separated English, <French>

    Dim Breakdown

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

    '-- If there is a tab at 1.21", there will be at least English
    TermElements(ndxTerm).Value.English = TermElements(ndxTerm).Value.English & " " & Breakdown(1)

    If UBound(Breakdown) > 1 Then '-- & if there was a 2nd tab, there was French too
    TermElements(ndxTerm).Value.French = TermElements(ndxTerm).Value.French & " " & Breakdown(2)
    End If

    End Sub

    Private Sub ExtractContinuedTermFrench(Para As Word.Paragraph)
    ' FORMAT ASSUMED: Tab, French
    '-- If the line only had French, it would always be at tab 4.71"

    Dim Breakdown
    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 2)
    TermElements(ndxTerm).Value.French = TermElements(ndxTerm).Value.French & " " & Breakdown(1)

    End Sub

    Private Sub ExtractClientLine(Para As Word.Paragraph)
    ' FORMAT ASSUMED: Tab-separated keyword, value
    '-- These are currently ignored but this code would parse the Client etc. sub-components
    '-- (but ".Output=True" would need to be set in the InitialiseElements function for each)

    Dim Breakdown
    Dim ndx As Long

    Breakdown = Split(Replace(Para.Range.Text, vbCr, ""), vbTab, 3)

    For ndx = 1 To UBound(ClientElements)
    If ClientElements(ndx).Keyword = UCase(Breakdown(1)) Then
    If UBound(Breakdown) > 1 Then ClientElements(ndx).Value = Breakdown(2)
    Exit Sub
    End If
    Next
    '-- Put up message if the keyword is not recognized & ignore line
    MsgBox "Data Type not recognised:" & vbCr & Para.Range.Text & vbCr & "Ignoring line"

    End Sub

    Private Sub WriteRecord(Optional FinalCall As Boolean)
    '-- Writes out the data collected for a term

    Dim OutputString As String
    Dim ndx As Long

    If RecordCount = -1 Then
    '-- On 1st entry, open the output file (***WORK OUT HOW TO SPECIFY THIS WITH A DIALOG***)
    Open "C:\Documents and Settings\Tony\Desktop\Eric.txt" For Output As #1
    RecordCount = RecordCount + 1
    Exit Sub
    End If

    OutputString = ""

    For ndx = 1 To UBound(TermElements) '-- add each of the E&F sub-components to the output string
    With TermElements(ndx)
    ' OutputString = OutputString & .Keyword & vbCr
    If .Output Then
    '-- This next code adds the E&F components but when there is no abbreviation,
    '-- it didn't add the tab so I've commented it out to see if I can tweak it
    ' If Trim(.Value.English) <> "" Then _
    ' OutputString = OutputString & .Value.English & vbTab
    ' If Trim(.Value.French) <> "" Then _
    ' OutputString = OutputString & .Value.French & vbTab
    '-- This will now put out just a tab if a value had not been detected for a term
    '-- (*** There may be a more elegant way to do this***)
    If Trim(.Value.English) <> "" Then
    OutputString = OutputString & .Value.English & vbTab
    Else
    OutputString = OutputString & vbTab
    End If
    If Trim(.Value.French) <> "" Then
    OutputString = OutputString & .Value.French & vbTab
    Else
    OutputString = OutputString & vbTab
    End If

    End If
    .Value.English = ""
    .Value.French = ""
    End With
    Next

    For ndx = 1 To UBound(DateElements) '-- add the date to the output string
    With DateElements(ndx)
    If .Output Then
    OutputString = OutputString & .Value
    End If
    .Value = ""
    End With
    Next


    Print #1, OutputString '-- write the output string to the file

    RecordCount = RecordCount + 1

    '-- This closes the output file after the final record data was written out. FinalCall gets
    '-- set at the end of the main routine when no additional term was found
    If FinalCall Then
    Close #1
    MsgBox "Completed with" & str(RecordCount) & " records processed"
    Exit Sub
    End If

    End Sub

    Private Sub InitialiseElements()

    Dim ndx As Long

    '-- This defines an array for each of the term subcomponents using the name that precedes
    '-- each one. (The name is in LC but TJ has probably set them in UC here so he can use UCase
    '-- to make them consistent if some were entered with different case.)
    '-- The ".Output" item is set True if that element is needed in the output string: interesting
    '-- idea and *** learn more about this!

    ReDim TermElements(0)
    ndx = UBound(TermElements)

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "TERME"
    TermElements(ndx).Output = True

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "ABR?VIATIONS"
    TermElements(ndx).Output = True

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "SYNONYMES"

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "CONTEXTE"

    ndx = ndx + 1
    ReDim Preserve TermElements(ndx)
    TermElements(ndx).Keyword = "SOURCE"

    ReDim DateElements(0)
    ndx = UBound(DateElements)

    ndx = ndx + 1
    ReDim Preserve DateElements(ndx)
    DateElements(ndx).Keyword = "DATE DE CR?ATION"
    DateElements(ndx).Output = True

    ReDim ClientElements(0)
    ndx = UBound(ClientElements)

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "CLIENT"

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "DOMAINE"

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "PROJET"

    ndx = ndx + 1
    ReDim Preserve ClientElements(ndx)
    ClientElements(ndx).Keyword = "AUTEUR"

    End Sub
    [/VBA]

    By the way Tony, I tested this with some extra lines with invalid names and non-standard tabs and it caught them (and ignored them) as expected. Running this on my 12MB RTF file with 6,767 definitions took only a couple of minutes and processed every record with only one alert on the first page title (the only non-standard line). Very nice... I suspect my original approach of modifying recorded code (Selection method?) would have ground away for hours!

    Thanks for the help; I've learned a lot from this!
    Eric

    Experience is not what happens to a man; it is what a man does with what happens to him. ? Aldous Huxley

  15. #15
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Eric,

    Glad it all (almost) worked.

    One quick note. The mistake I made with the missing output tabs ..
    [vba]
    If .Output Then
    '-- This next code adds the E&F components but when there is no abbreviation,
    '-- it didn't add the tab so I've commented it out to see if I can tweak it
    ' If Trim(.Value.English) <> "" Then _
    ' OutputString = OutputString & .Value.English & vbTab
    ' If Trim(.Value.French) <> "" Then _
    ' OutputString = OutputString & .Value.French & vbTab
    '-- This will now put out just a tab if a value had not been detected for a term
    '-- (*** There may be a more elegant way to do this***)
    If Trim(.Value.English) <> "" Then
    OutputString = OutputString & .Value.English & vbTab
    Else
    OutputString = OutputString & vbTab
    End If
    If Trim(.Value.French) <> "" Then
    OutputString = OutputString & .Value.French & vbTab
    Else
    OutputString = OutputString & vbTab
    End If
    End If [/vba]

    Can be more simply done just with ..[vba]If .Output Then
    '-- This next code adds the E&F components but when there is no abbreviation,
    '-- it didn't add the tab so I've commented it out to see if I can tweak it
    ' If Trim(.Value.English) <> "" Then _
    ' OutputString = OutputString & .Value.English & vbTab
    ' If Trim(.Value.French) <> "" Then _
    ' OutputString = OutputString & .Value.French & vbTab
    '-- This will now put out just a tab if a value had not been detected for a term
    '-- (*** There may be a more elegant way to do this***)
    OutputString = OutputString & .Value.English & vbTab
    OutputString = OutputString & .Value.French & vbTab
    End If[/vba]

    When .Value.language is empty, it simply writes a zero length string (i.e. nothing) as output followed by the tab.

    I actually did this to start with and then changed my mind - I'm not sure why, now, when I think about it!
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  16. #16
    VBAX Regular
    Joined
    Aug 2004
    Location
    On a 100 acre hobby farm in beautiful west Quebec.
    Posts
    87
    Location
    That was what I tried first Tony, but then it leaves a single space in the tab position when an abbreviation is not there. The test for non-blank ovecromes that.

    The other thing I noticed while doing this exercise was how the database uses the foot symbol (') instead of an apostrophe (i.e. can't, n'est instead of can?t, n?est). I run into this sort of thing a lot, and usually deal with it using Find & Replace with "smart quotes" set to on. However, is this the sort of thing that the Replace function could handle? And would it work for an entire object (like the current document) or is there a limit to the string's length?

    Eric

    Experience is not what happens to a man; it is what a man does with what happens to him. ? Aldous Huxley

  17. #17
    VBAX Master TonyJollans's Avatar
    Joined
    May 2004
    Location
    Norfolk, England
    Posts
    2,291
    Location
    Hi Eric,

    Try ...[vba]OutputString = OutputString & Trim(.Value.English) & vbTab
    OutputString = OutputString & Trim(.Value.French) & vbTab[/vba]

    I don't think there's any special limit on the length of string Replace can handle so it should work on Document.Content.Text - and yes, it could replace quotes with smart quotes - but smart quotes are language-dependent so you might need to do some extra checks if changing, say, double quotes to double chevrons in French text.
    Enjoy,
    Tony

    ---------------------------------------------------------------
    Give a man a fish and he'll eat for a day.
    Teach him how to fish and he'll sit in a boat and drink beer all day.

    I'm (slowly) building my own site: www.WordArticles.com

  18. #18
    VBAX Regular
    Joined
    Aug 2004
    Location
    On a 100 acre hobby farm in beautiful west Quebec.
    Posts
    87
    Location
    Thanks Tony. After I sent that post I tried it (similar to what you just posted here) and it works fine for the apostrophes. You're right of course about other kinds of quotes needing >1 character (i.e. "this" becomes ? this ? in French). Your use of the Replace function was a revelation to me and is prompting me to consider how I can use it elsewhere!
    Eric

    Experience is not what happens to a man; it is what a man does with what happens to him. ? Aldous Huxley

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •