PDA

View Full Version : Grouping with Regular expressions in EXCEL 2010 VBA



jspattavina
01-19-2014, 01:09 PM
I want to Use grouping with Regular Expressions in EXCEL 2010 and VBA to get the selected parts of a string. I wrote the following function which implements a Get Regular expression (see FUNCTION below).

1. Given a string "123 ABC 308" and a pattern = " \d+". If I specify occurrence n = 0 it returns the first occurrence "123". Likewise if I specify occurrence n = 1 it returns the second occurrence "308". This works fine.

2. I want to use the group feature to return the username and host separately (i.e., alice-b or google.com). I cannot figure out how to do this!

The group feature of a regular expression allows you to pick out parts of the matching text. Given the string "purple alice-b@google.com monkey dishwasher". Suppose I want to extract the username and host separately. To do this, I add parenthesis ( ) around the username and host in the pattern, like this: ([\w.-]+)@([\w.-]+)'. In this case, the parenthesis establish logical "groups" inside of the match text. However, if I set n = 0 or 1 it returns the entire pattern match "alice-b@google.com". How do I use groupings specifically in VBA for EXCEL to do this. I am not so much interested in this particular example, but want to know how to use groupings inside an EXCEL VBA function.




---------------------------------------------------------------------
FUNCTION

Function RegGet1()
'Initialize
Dim str As String
Dim pattern As String
Dim n As Integer
n = 0
str = "purple alice-b@google.com monkey dishwasher"
pattern = "([\w.-]+)@([\w.-]+)"
'str = "123 ABC 308"
'pattern = "\d+"
'Define RegEx Object
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.multiline = False
.IgnoreCase = False
.pattern = pattern
' Get RegEx result
out = .Execute(str)(n)
Debug.Print out
End With
End Function
---------------------------------------------------------------------
Regards,Jeffrey Pattavina
jspattavina@comcast.net

snb
01-19-2014, 03:44 PM
Pleas use code tags !!


sub M_snb()
c00="purple alice-b@google.com monkey dishwasher"
msgbox split(Split(c00,"@")(0))(1)
msgbox split(split(c00,"@")(1))(0)
end sub

jspattavina
01-19-2014, 04:58 PM
I know how to do it using split. I am more interested in how to use RegEx. In particular using parenthesis for grouping and capturing text.
I figured it out myself.

snb
01-20-2014, 03:07 AM
So why not sharing your solution ?
This is a forum after all.

lecxe
01-28-2014, 05:24 AM
Hi Jeffrey

Not sure this is what you found, but your pattern defines 1 match with 2 submatches.

If you want, for example the first submatch of match n, you can get it using


out = .Execute(str)(n).Submatches(0)



Now this is what I don't understand. You say:

I figured it out myself.


You ask for help and expect someone to share the solution with you, then you find the solution and you don't share.

Really strange.

jspattavina
08-04-2015, 09:21 AM
Hi Jeffrey

Not sure this is what you found, but your pattern defines 1 match with 2 submatches.

If you want, for example the first submatch of match n, you can get it using



out = .Execute(str)(n).Submatches(0)



Now this is what I don't understand. You say:

I figured it out myself.


You ask for help and expect someone to share the solution with you, then you find the solution and you don't share.

Really strange.

Sorry for the delay. I had personal matters that came up and was not keeping up with my post for a long while.

Here is the solution I found:
-----------------------------------------------------------------------------------------------------------
SOLUTION:
Given string ="purple alice-b@google.com monkey dishwasher" then the regex pattern ="([\w\s-]+)[^@]([\w]+"
will do the job. Extracting the firs match gives "purple alice-b" and the second match gives "google.com"

-----------------------------------------------------------------------------------------------------------