Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 21

Thread: A (fiendly) Challenge to brettdj

  1. #1
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,446
    Location

    A (fiendly) Challenge to brettdj

    In the few weeks that I have visiting this forum, I have seen a number of responses from brettdj advocating the use of Regular Expressions.I have a problem with RegExp, so I thought I would throw it out as a challenge to see if anyone can take what I have done to its conclusion. I have addressed this post at brettdj, but I would love to hear suggestions from anyone. What's in it for you? Okay, a virtual bottle of Lagavulin to the winner (virtual means that, you win it, but as I can't deliver, I will drink it for you - I will let you know what it tastes like though ).

    As a quick aside, I tend not to advocate the use of RegExp for problem solving. Not because they are not good, they are great, and are the ideal tool for most (all?) text pasring problems. I don't advocate them on the basis of future maintainability. IMO, RegExp are not intuitive and not something that you use every day. RegExp to me seem somthing for the real obscure language function geek, or the super brains like Harlan Grove, not for us lesser mortals. Couple that with the fact that they are not like the high-level languages that we mainly use nowadays, this means that when you want to change a RegExp, finding the skills is difficult, so we either pay big bucks for a RegExp geek, or we do it another way. I know which way my vote goes.

    Anyway I digress, and onto the problem. I am sure you have come across as many questions as I have regarding splitting a name pair (surname & forename, or vices versa) into separate components. This has always seemed an ideal candidate for RegExp to me (although I never offer it in my responses to such questions for the reasons stated above), but when I was buiding my sort add-in I decided to use RegExp in the function to sort by name. Obviously I need to cater for all varities of name, such as
    - Mr Alan Jones
    - Ian St John
    - Peter McDougall
    - Peter MacDougall
    - Baron von Richtofen
    etc. etc.

    The problem is, with my RegExp skills (or lack of such), I couldn't get a RegExp to do it all. I had to add some code to pre-process the data before I passed it to my RegExp routine. The things I couldn't manage were the Mc, Mac issues (they need to be processed as the first M names, and St, also as the first S, and ensuring that the Fornename and Surname are capitalized (so that Jim jones doesn't get sorted after Alan Minter), and treating von and Von, van and Van as the same.

    This is the RegExp parser, nothing unusal here

    [VBA]
    '---------------------------------------------------------------------
    Private Function RESubString(Inp As String, _
    Pattern As String, _
    Optional N As Long = 0) As String
    '---------------------------------------------------------------------
    Dim oRegExp As Object, m As Object
    On Error GoTo RE_error
    Set oRegExp = CreateObject("VBScript.RegExp")
    oRegExp.Pattern = Pattern
    oRegExp.Global = True
    Set m = oRegExp.Execute(Inp)
    RESubString = IIf(m.Count > 0, m(N).Value, "")
    GoTo RE_Exit
    RE_error:
    RESubString = "RE Error"
    RE_Exit:
    Set oRegExp = Nothing
    On Error GoTo 0
    End Function
    [/VBA]

    This is the original calling routine which sets up the parsing string (you can see one of my other attempts)

    [VBA]
    '---------------------------------------------------------------------
    Private Function LastName(nme As String)
    '---------------------------------------------------------------------
    ' Function: Extracts the last name from a names string
    ' Synopsis: Calls 2 functions:-
    ' ReFornmat - frigs the data for St, Mc and Mac, and
    ' capitalizesc all names
    ' RESubString - runs a regular expression to return the
    ' lats name
    '---------------------------------------------------------------------
    Dim sREgExp As String

    ' sREgExp = "\b([a-z]+ +)*(O'|Mc|Mac)?[A-Z](\w+\S?)*(-[A-Z](\w+\S?)*)?\b(?=(( +)(Sr\.?|Jr\.?|[IVX][IVX]*))|,|\s*$)"
    sREgExp = "\b([a-z]+\s+)*[A-Z](\w+\S?)*([-'][A-Z](\w+\S?)*)?\b(?=(\s+([JS]r\.?|[IVX]+))?\s*$|,)"
    LastName = RESubString(ReFormat(nme), sREgExp)

    End Function
    [/VBA]

    And these are all my pre-processing routines which I wrote to make up for the deficiencies of my RegExp parsing expression.

    [VBA]
    Private Function ReFormat(Name As String)

    ReFormat = Capitalize(AdjustNamePreps(Name))

    End Function


    '---------------------------------------------------------------------------
    Public Function Capitalize(Name As String)
    '---------------------------------------------------------------------------
    Dim aParts
    Dim i As Long

    aParts = pzSplit(LCase(RemoveMultipleSpaces(Name)), " ")
    For i = LBound(aParts, 1) To UBound(aParts, 1)
    aParts(i) = UCase(Left(aParts(i), 1)) & Right(aParts(i), Len(aParts(i)) - 1)
    Next i
    Capitalize = pzJoin(aParts, " ")

    End Function


    '---------------------------------------------------------------------------
    Public Function AdjustNamePreps(Name As String)
    '---------------------------------------------------------------------------
    Dim i As Long, j As Long

    For i = 1 To Len(Name)
    If Mid(Name, i, 1) = "S" Then
    If Mid(Name, i + 1, 1) = "t" And Mid(Name, i + 2, 1) = " " Then
    Name = Left(Name, i) & Chr(65) & _
    Mid(Name, i + 1, 1) & Mid(Name, i + 3, 255)
    Exit For
    End If
    End If
    Next i

    For i = 1 To Len(Name)
    If Mid(Name, i, 1) = "M" Then
    If Mid(Name, i + 1, 1) = "c" Then
    Name = Left(Name, i) & Chr(65) & Mid(Name, i + 1, 255)
    Exit For
    ElseIf Mid(Name, i + 1, 1) = "a" And Mid(Name, i + 2, 1) = "c" Then
    Name = Left(Name, i) & Chr(65) & Mid(Name, i + 1, 255)
    Exit For
    End If
    End If
    Next i

    AdjustNamePreps = Name

    End Function
    [/VBA]

  2. #2
    VBAX Tutor
    Joined
    Mar 2005
    Posts
    268
    Location
    Nice one xld! I've never investigated RegExp, and I have to congratulate you on giving me a one-line reason for my aversions:

    [vba]sREgExp = "\b([a-z]+\s+)*[A-Z](\w+\S?)*([-'][A-Z](\w+\S?)*)?\b(?=(\s+([JS]r\.?|[IVX]+))?\s*$|,)" [/vba]
    I wish you luck in your quest!

  3. #3
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Quote Originally Posted by xld
    . In the few weeks that I have visiting this forum, I have seen a number of responses from brettdj advocating the use of Regular Expressions
    You must have come accross some of my older posts as I haven't been hugely active lately. But yes, I am RegExp fan.
    Quote Originally Posted by xld
    . I had to add some code to pre-process the data before I passed it to my RegExp routine. The things I couldn't manage were the Mc, Mac issues (they need to be processed as the first M names, and St, also as the first S, and ensuring that the Fornename and Surname are capitalized (so that Jim jones doesn't get sorted after Alan Minter), and treating von and Von, van and Van as the same
    Yep, RegExp is fantastic when parsing/seraching a particular patterm

    When the rules have a number of exceptions or checks then its a choice of using multiple RegExps and/or doing "pre or post processing" as you did above.

    For example, when I wrote code to change cell references from absolute to relative or vice versa, the patten I used for the cell address was
    (\$?)([A-Z]{1,2})(\$?)(\d{1,5})
    The code then has a step to check that any five digit number is less than 65537, and that the next character is not numeric, ie that the numeric portion is a valid cell reference

    A longer pattern could have done the job with the RegExp alone, with an additional check that the next character neccesitates a word break
    (\$?)([A-Z]{1,2})(\$?)(\d{1,4}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d{1}|6553[0-6])\b

    But I figured it was more transparent to work with a shorter pattern and then post process in the code.

    When I've parsed street addresses I've had to do similar pre-work to catch the "inconsistencies" in how the data may be presented.

    So its a choice between RegEx world and "normal" coding, ie in your code below you could use a RegExp Replace to change your case from lower to upper case rather than Ucase.

    One of VBscripts problems is that is doesn't have the lookahead or lookbehind capability of RegExp, this adds to the prework needed.

    Anyhow ......... is there more info on your sort program?

    I'll have a crack at this with as much RegExp as possible but I don' tthink it will be pretty.

    Cheers

    Dave

  4. #4
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    how about a series of RegExps, ie

    [vba]
    Option Explicit
    Function RevisedLastName(Source As String)
    Dim Regex As Object, UpCase As String, StrMan As String
    Set Regex = CreateObject("vbscript.regexp")
    'remove all "."
    Source = Replace(Source, ".", "")
    'append Saint or St as SAaint or SAt
    With Regex
    .ignorecase = True
    .Global = False
    .Pattern = ".+\b(S)(aint|t|)\b\s+(\b[a-z]+$)"
    Source = Regex.Replace(Source, "$1A$2$3")
    End With
    ' Replace Mac or Mc at the start of the last word if the next letter is capitalised
    ' Note that this will miss lower case Mac's but there is no foolprooh test
    ' ie Macarthur will not be changed, MacMillan will be
    With Regex
    .ignorecase = False
    .Pattern = ".+\b(M)(ac|c)([A-Z]{1}[a-z]+$)"
    Source = Regex.Replace(Source, "$1A$2$3")
    End With
    'Capitalise string properly for last word.
    ' Note, can't use ApplicationProper in case name is like MacDougall.
    With Regex
    .ignorecase = False
    .Pattern = "(^.+\b)([a-z])([a-z]*$)"
    UpCase = UCase(Regex.Replace(Source, "$2"))
    Source = .Replace(Source, "$1" & UpCase & "$3")
    End With
    'return last names
    With Regex
    .ignorecase = True
    .Global = False
    .Pattern = "(^.+)(V(o|a)n)(\b\s+\b[a-z]+)$"
    'Check for von or van
    If .Test(Source) = True Then
    StrMan = .Replace(Source, "$2$4")
    'Ensure that Von or Van is also capitalised
    RevisedLastName = UCase(Left(StrMan, 1)) & Right(StrMan, Len(StrMan) - 1)
    Else
    'No von or van so just return last name
    .Pattern = "^.+\b\s+(\b[a-z]+)$"
    RevisedLastName = .Replace(Source, "$1")
    End If
    End With
    Set Regex = Nothing
    End Function
    [/vba]

  5. #5
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,446
    Location
    Quote Originally Posted by brettdj
    You must have come accross some of my older posts as I haven't been hugely active lately. But yes, I am RegExp fan.
    Yep, RegExp is fantastic when parsing/seraching a particular patterm
    Ir wasn't necessarily solutions in RegExp that I referred to, but comments made. remember, I have only been on this board a month.

    Quote Originally Posted by brettdj
    But I figured it was more transparent to work with a shorter pattern and then post process in the code.
    Take you point, but it would still be nice to see a complete RegExp solution, if only for the satisfaction of doing so. In many ways this is an intellectual exercise, as my code does what I want (within the limits of my testing thus far), but that doesn't mean I like it.

    Quote Originally Posted by brettdj
    One of VBscripts problems is that is doesn't have the lookahead or lookbehind capability of RegExp, this adds to the prework needed.
    Don't quote me, check it yourself, but I think that VBScript 5.5 introduced look-ahead!

    Quote Originally Posted by brettdj
    I'll have a crack at this with as much RegExp as possible but I don' tthink it will be pretty.
    That will be fine, no RegExp is pretty IMO.

  6. #6
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Have you tried the UDF above?

    Quote Originally Posted by xld
    Don't quote me, check it yourself, but I think that VBScript 5.5 introduced look-ahead!
    Unfortunately not

    From http://aspnet.4guysfromrolla.com/articles/022603-1.aspx

    Lookaround is a feature that is partially implemented in JScript but not in VBScript. There are two directions of lookaround - lookahead and lookbehind - and two flavors of each direction - positive assertion and negative assertion. The syntax for each is:

    (?=...) - Positive lookAHEAD
    (?!...) - Negative lookAHEAD
    (?<=...) - Positive lookBEHIND
    (?<!...) - Negative lookBEHIND

    Cheers

    Dave

  7. #7
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,446
    Location
    Dave,

    Finally created some test data for it (my data was more of a demo and didn't cover enough conditions).

    So far it looks good. Major fault with it at present is that it doesn't handle Jr and Snr correctly, so Bill Oakes Jr somes out as Jr. I think it should parse Jr, Jr., Jr, Junior, with or without capitals as Jr. Similalrly for Sr, Sr., Snr, Snr., and Senior to Snr, but of course retaing the surname.

    I'll keep on testing some more.

    Cheers Bob

  8. #8
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Bob,

    So it needs a search for the Jr or Sr variations as the last word, and if this word exists it should be appended to the last name(s) as Jr. or Snr. ??

    Cheers

    Dave

  9. #9
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,446
    Location
    Quote Originally Posted by brettdj
    So it needs a search for the Jr or Sr variations as the last word, and if this word exists it should be appended to the last name(s) as Jr. or Snr. ??
    Yeah, you've got it.

  10. #10
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    how about

    [vba]
    Option Explicit
    Function RevisedLastName(Source As String)
    Dim Regex As Object
    Dim AmericanName As String
    Dim UpCase As String, StrMan As String
    Set Regex = CreateObject("vbscript.regexp")
    'remove all "."
    Source = Replace(Source, ".", "")
    'Find and remove for now any variation of Senior or Junior at the end of the string
    'Store this value in AmericanName
    With Regex
    .ignorecase = True
    .Global = False
    .Pattern = "(.+\b)\s+(Senio|S|Junio|J)(r)$"
    If Regex.test(Source) = True Then
    AmericanName = " " & Regex.Replace(Source, "$2$3") & "."
    Source = Regex.Replace(Source, "$1")
    End If
    End With
    'append Saint or St as SAaint or SAt
    With Regex
    .ignorecase = True
    .Pattern = ".+\b(S)(aint|t|)\b\s+(\b[a-z]+$)"
    Source = Regex.Replace(Source, "$1A$2$3")
    End With
    ' Replace Mac or Mc at the start of the last word if the next letter is capitalised
    ' Note that this will miss lower case Mac's but there is no foolproof test
    ' ie Macarthur will not be changed, MacMillan will be
    With Regex
    .ignorecase = False
    .Pattern = ".+\b(M)(ac|c)([A-Z]{1}[a-z]+$)"
    Source = Regex.Replace(Source, "$1A$2$3")
    End With
    'Capitalise string properly for last word.
    ' Note, can't use ApplicationProper in case name is like MacDougall.
    With Regex
    .ignorecase = False
    .Pattern = "(^.+\b)([a-z])([a-z]*$)"
    UpCase = UCase(Regex.Replace(Source, "$2"))
    Source = .Replace(Source, "$1" & UpCase & "$3")
    End With
    'return last names
    With Regex
    .ignorecase = True
    .Global = False
    .Pattern = "(^.+)?(V(o|a)n)(\b\s+\b[a-z]+)$"
    'Check for von or van
    If .test(Source) = True Then
    StrMan = .Replace(Source, "$2$4")
    'Ensure that Von or Van is also capitalised
    RevisedLastName = UCase(Left(StrMan, 1)) & Right(StrMan, Len(StrMan) - 1) & AmericanName
    Else
    'No von or van so just return last name
    .Pattern = "^.+\b\s+(\b[a-z]+)$"
    RevisedLastName = .Replace(Source, "$1") & AmericanName
    End If
    End With
    Set Regex = Nothing
    End Function
    [/vba]

  11. #11
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Change this line from
    AmericanName = " " & Regex.Replace(Source, "$2$3") & "."
    to
    AmericanName = " " & UCase(Left(Regex.Replace(Source, "$2"), 1)) & "r."

    to force the Jr. rather than Junior.

    Cheers

    Dave

  12. #12
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Do I get the Lagavulin?

  13. #13
    Administrator
    VP-Knowledge Base
    VBAX Guru MOS MASTER's Avatar
    Joined
    Apr 2005
    Location
    Breda, The Netherlands
    Posts
    3,281
    Location
    Wow missed this topic somewhere I think.

    Excellent material looks nice Dave. I do hope you make it a kb entry? (Or Bob's sollution if he wins...or better yet both)

    Now for the contest who won Bob? Better yet who's drinking "Lagavulin" (If I'd only knew what that was...)
    _________
    Groetjes,

    Joost Verdaasdonk
    M.O.S. Master

    Mark your thread solved, when it has been, by hitting the Thread Tools dropdown at the top of the thread.
    (I don't answer questions asked through E-mail or PM's)

  14. #14
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,476
    Location
    sl?inte (slan?tchuh): A drinking toast
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  15. #15
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,446
    Location
    Quote Originally Posted by MOS MASTER
    Excellent material looks nice Dave. I do hope you make it a kb entry? (Or Bob's sollution if he wins...or better yet both)
    In reality it is being won at the moment because I was looking for a single regular expresssion, and we currently have a series. Also there are still a few anomalies that I need to get back to Dave on, but I haven't yet gotten around to. SO I still owe Dave a response.

    Quote Originally Posted by MOS MASTER
    Better yet who's drinking "Lagavulin" (If I'd only knew what that was...)
    Well I am for sure! As for not knowing what Lagavulin is Joost, don't they educate you in your country? Lagavulin is one of the finest examples of God's nectar, the real ambrosia of the Gods, the water of life, the reason for living ... single malt Scotch Whisky... phew, I feel better now!

  16. #16
    Administrator
    VP-Knowledge Base VBAX Guru MOS MASTER's Avatar
    Joined
    Apr 2005
    Location
    Breda, The Netherlands
    Posts
    3,281
    Location
    Quote Originally Posted by xld
    Well I am for sure! As for not knowing what Lagavulin is Joost, don't they educate you in your country? Lagavulin is one of the finest examples of God's nectar, the real ambrosia of the Gods, the water of life, the reason for living ... single malt Scotch Whisky... phew, I feel better now!
    Indeed Bob, we are a dumb uneducated mob over here. (not worthy of God's nectar) How did I ever get by not knowing this?

    We seldom get out of our cave and if we do we look for female companions who are dumb enough to come home with us. (and enjoy a simple beer)

    But again being ignorant has the benefit of not having to carry al that weight in knowledge!

    Let the games continue! (Hope you find that single line sollution)
    _________
    Groetjes,

    Joost Verdaasdonk
    M.O.S. Master

    Mark your thread solved, when it has been, by hitting the Thread Tools dropdown at the top of the thread.
    (I don't answer questions asked through E-mail or PM's)

  17. #17
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Quote Originally Posted by xld
    In reality it is being won at the moment because I was looking for a single regular expresssion, and we currently have a series. Also there are still a few anomalies that I need to get back to Dave on, but I haven't yet gotten around to. SO I still owe Dave a response.
    If would be the mother of all RegExp's to do this in one hit -my question would be "why?"

    Bob, let me know what the anomalies are and I'll tidy them up

    Cheers

    Dave

  18. #18
    VBAX Mentor Sir Babydum GBE's Avatar
    Joined
    Mar 2005
    Location
    Cardiff, UK
    Posts
    499
    Location
    Bob, Can I have some of your Lagavulin for being so lovely to you in your first few months?
    If so, enjoy it and think of me!
    Have a profound problem? Need a ridiculous solution? Post a question in Babydum's forum

  19. #19
    VBAX Expert brettdj's Avatar
    Joined
    May 2004
    Location
    Melbourne
    Posts
    649
    Location
    Quote Originally Posted by brettdj View Post
    Have you tried the UDF above?

    Unfortunately not

    From http://aspnet.4guysfromrolla.com/articles/022603-1.aspx

    Lookaround is a feature that is partially implemented in JScript but not in VBScript. <snip>

    Cheers

    Dave
    I note that this is incorrect and Bob was right, lookahead - but not lookbehind - is available

  20. #20
    Moderator VBAX Wizard Aussiebear's Avatar
    Joined
    Dec 2005
    Location
    Queensland
    Posts
    5,449
    Location
    Left field post or was it the Lagavulin??
    Remember To Do the Following....
    Use [Code].... [/Code] tags when posting code to the thread.
    Mark your thread as Solved if satisfied by using the Thread Tools options.
    If posting the same issue to another forum please show the link

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •