PDA

View Full Version : Solved: RegExp pattern string help



matthewspatrick
01-24-2006, 10:01 AM
I use the following function to put a nice little wrapper on RegExp.Find:


Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos)
' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
' pattern (PatternStr). Use Pos to indicate which match you want:
' Pos omitted : function returns a zero-based array of all matches
' Pos = 0 : the last match
' Pos = 1 : the first match
' Pos = 2 : the second match
' Pos = <positive integer> : the Nth match
' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
' returns an empty string. If no match is found, the function returns an empty string

' If you use this function in Excel, you can use range references for any of the arguments.
' If you use this in Excel and return the full array, make sure to set up the formula as an
' array formula. If you need the array formula to go down a column, use TRANSPOSE()

Dim re As Object
Dim TheMatches As Object
Dim Answer() As String
Dim Counter As Long

' Evaluate Pos. If it is there, it must be numeric and converted to Long
If Not IsMissing(Pos) Then
If Not IsNumeric(Pos) Then
RegExpFind = ""
Exit Function
Else
Pos = CLng(Pos)
End If
End If

' Create instance of RegExp object
Set re = CreateObject("VBScript.RegExp")
With re
.Pattern = PatternStr
.Global = True
End With

' Test to see if there are any matches
If re.test(LookIn) Then

' Run RegExp to get the matches, which are returned as a zero-based collection
Set TheMatches = re.Execute(LookIn)

' If Pos is missing, user wants array of all matches. Build it and assign it as the
' function's return value
If IsMissing(Pos) Then
ReDim Answer(0 To TheMatches.Count - 1) As String
For Counter = 0 To UBound(Answer)
Answer(Counter) = TheMatches(Counter)
Next
RegExpFind = Answer

' User wanted the Nth match (or last match, if Pos = 0). Get the Nth value, if possible
Else
Select Case Pos
Case 0 ' Last match
RegExpFind = TheMatches(TheMatches.Count - 1)
Case 1 To TheMatches.Count ' Nth match
RegExpFind = TheMatches(Pos - 1)
Case Else ' Invalid item number
RegExpFind = ""
End Select
End If

' If there are no matches, return empty string
Else
RegExpFind = ""
End If

' Release object variables
Set re = Nothing
Set TheMatches = Nothing

End Function



I tried to use it on a string like this; I want to grab just the www.msn.com (http://www.msn.com) part:

http://www.msn.com/home.asp

In Excel:
=RegExpFind(A1,"//[^/]*/",1)

That gets me:
//www.msn.com/

I can kill the slashes this way:
=SUBSTITUTE(RegExpFind(A1,"//[^/]*/",1),"/","")

However, I would rather do it by setting up the pattern string correctly. Can someone help? :help

Patrick

mvidas
01-24-2006, 10:14 AM
Hey Patrick,

First off, short answer: nope, can't be done well. Sure, you could do something like
=RegExpFind(A1,"\w+\.\w+\.\w+",1)

But if a1 had a link like http://www.state.ny.us/ then you'd only be returning the www.state.ny (http://www.state.ny)
Also, if the link was http://msn.com/home.asp it would not return anything.

Also, FWIW, your current pattern could be:
=RegExpFind(A1,"//.+?/",1)
A couple characters shorter :P

Looks like its time for you to make a new function! One that includes submatches.. I'll let you program the logic (though I'll gladly help if you'd like), but one that allows you to use a formula like:

=RegExpFindSub(A1,"//(.+?)/",1,1)

As I said, let me know if you need/want any help with this!
Matt

matthewspatrick
02-05-2006, 03:57 PM
Matt,

Thanks! BTW, I have a KB idea in the hopper using a slightly refined version of that UDF and another one based on RegExp.Replace. (The change: an optional argument allowing the user to toggle the IgnoreCase property.)

Patrick