PDA

View Full Version : [SOLVED] Regexp question



mvidas
02-02-2005, 11:02 AM
Hi all,

Looking to see how to create a pattern for regular expressions to say "does not include the string ____"

More specifically, I'm looking for a single pattern to say "starts with 'abc', ends with 'ghi', but does not contain the string 'def'"
Something along the lines of "abc.*^(def).*ghi" which obviously wont work

Where "abcdefghi" will not be a match, but "abcfedghi" would

Any ideas?
Matt

Aaron Blood
02-02-2005, 02:48 PM
Pretty easy to do in VBA using the the LIKE operator.

You would just have to test two conditions instead of one.


Sub test()
Dim txt$, result As Boolean
txt$ = ActiveCell.Value
If txt$ Like "abc*ghi" And Not txt$ Like "*def*" Then
result = True
Else
result = False
End If
MsgBox result
End Sub



If you wanted to do it in a cell formula you could probably make it work with FIND or SEARCH in conjunction with the LEFT and RIGHT functions. Or you could wrap the LIKE operator as a UDF like so...



Function TextLike(Text As String, Filter As String) As Boolean
TextLike = Text Like Filter
End Function


Then the cell formula would be something like...


=textlike(A1,"abc*ghi")*NOT(textlike(A1,"*def*"))

...which would return a binary for true/false.

mvidas
02-02-2005, 02:59 PM
Thanks Aaron,
I do understand that, but I'm trying to find a one-line regex pattern that can do this. The user needs to use regexp, and cant loop through or use Like or anything.
The actual use is to import a webpage as a string, and find tags starting with <img and ending in > without alt in the middle anywhere. I'm just having trouble finding the exact pattern to use

Aaron Blood
02-02-2005, 03:30 PM
OIC... so this isn't an Excel question. Haven't used it... but I imagine there's a regexp forum somewhere you can post to.

This site seems to have some syntax listed.
http://www.greenend.org.uk/rjk/2002/06/regexp.html

Good luck.

mvidas
02-02-2005, 03:34 PM
Thanks. I know its not really an excel question, but since I'm using excel for it, and so many people here are very smart, I thought it might be worth a shot :)
Personally, i don't think it can be done, but i'm often outsmarted by regex.
Thanks again!

Aaron Blood
02-02-2005, 03:40 PM
Only seen a few scattered posts on the topic... but not too much on the XL boards.

Maybe one of the VB boards.

brettdj
02-03-2005, 12:43 AM
Hi Matt,

Matching a not string as opposed to a not character is a problem with vbscript regexp. Perl offers a negative lookahead which lets you do this, I've posted the Perl example below - which is likely to annoy you once you see it is exactly what you want but vbscript won't let you do it

"(?!pattern)"
A zero-width negative look-ahead assertion. For example
"/foo(?!bar)/" matches any occurrence of "foo" that isn't
followed by "bar".

"(?<!pattern)"
A zero-width negative look-behind assertion. For example
"/(?<!bar)foo/" matches any occurrence of "foo" that does not
follow "bar". Works only for fixed-width look-behind.

But back to VBscript

if you try this
^(abc).+[^def].+ghi$
the regular expression is actually looking for
start string.....abc...anything but d or e or f....ghi...end string
not
start string.....abc...anything but def....ghi...end string

and if you try making the string a submatch you will find the regexp
merely adds "(" and ")" to the don't match group
^(abc).+[^(def)].+ghi$

so you could use two regexps, ie



Dim RegEx As Object
Dim TestStr As String
' TestStr = "abcdeftmeghi" 'invalid
TestStr = "abcdeghi" ' valid
'TestStr = "abcghi" ' invalid as there must be at least once character between abc and ghi
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
.Pattern = "(.+)(def)(.+)"
.Global = False
.MultiLine = True
'test for one string of "def". "def cannot start or finish the string"
If .test(TestStr) = False Then
'replaced "def" with "" and test for "abc" at front, and "ghi" at end
TestStr = .Replace(TestStr, "$1$3")
.Pattern = "^(abc).+ghi$"
MsgBox "String test is " & .test(TestStr)
Else
MsgBox "string not tested as it contains ""def"" somewhere between the first and last characters"
End If
End With

Now, gimme points :)

Cheers

Dave

mvidas
02-03-2005, 08:31 AM
Thanks Dave,

I was trying to use excel to do this, to test different patterns for the user. Makes sense as to why I couldn't do it, too bad we cant use perl syntax in excel :) The user is going to be using an editing program that allows different syntax's (but doesnt allow scripting for whatever reason), so I'm gonna go see if that will work or not. Personally I'd be glad to be rid of this whole situation, so I'm really hoping it will work!

I've got some time tonight to play around with your code, and have been paying attention to the refedit thread. I'll see if I can't find some of the more obscure errors and possible workarounds for you :)


Now, gimme points :)Where am I, the lounge?

brettdj
02-03-2005, 05:58 PM
Matt,

You might want to try

^(abc).*ghi$
rather than
"^(abc).+ghi$

if you wanted a string such as
"abcghi"
to be passed

^(abc).+ghi$
requires
abcXghi
where X is one or more characters to be passed

Thanks for looking at the code and refedit :)

Cheers

Dave