PDA

View Full Version : RegExp help!



mvidas
08-26-2004, 09:01 AM
Hi everyone,

Two questions. The second one may not need to be answered, depending on the answer to the first. My situation is that I made a subroutine to parse a string based on the pattern, and it separates the text around the regexp matches into each cell in a given row (it's to parse a string based on delimiters chosen at runtime). Here is the code:

Sub ProcessRow(SH As Worksheet, RW As Long, PAT As String, STG As String)
Dim RGX, MAT, MATS, II As Long, JJ As Long, BOOL As Boolean
Set RGX = New RegExp
On Error Resume Next
II = 1
JJ = 1
RGX.Pattern = PAT
RGX.MultiLine = True
RGX.IgnoreCase = True
RGX.Global = True
BOOL = RGX.Test(STG)
If Err.Number = 0 Then
Set MATS = RGX.Execute(STG)
For Each MAT In MATS
SH.Cells(RW, II) = Mid(STG, JJ, MAT.FirstIndex - JJ + 1)
JJ = MAT.FirstIndex + MAT.Length + 1
II = II + 1
Next
End If
On Error GoTo 0
SH.Cells(RW, II) = Mid(STG, JJ)
End Sub

My first question is, the above uses early binding (I think), which requires a reference to regexp. I can incorporate the reference setting into the code, but if the user has office xp it will cause that error to the user. How can I accomplish the same thing using late binding?

My second question is, if I need to use early binding, how can I speed up the above sub? I'm testing it by importing text from a text file, and parsing it as the line is input. If I just import the text right into column A without parsing, it takes only 3 seconds. But when I import it then process it using the above, it takes about 2:10. I know it's not going to be close to 3 seconds by parsing it (as it uses a for loop, and my test workbook is parsed into 11 columns), but I'm hoping there is something I can do to speed it up.

By the way--thanks Dave/brettdj for the regexp file in the KB! Helped a lot!

Thanks
Matt

byundt
08-26-2004, 07:46 PM
Matt,
It may be as simple as
'Set RGX = New RegExp
Set RGX = CreateObject("VBScript.RegExp")
Brad

brettdj
08-26-2004, 08:26 PM
Matt,

I think that Brad has sorted you out with the late binding

Note that you can set the reference with

Sub Make_VBS_Ref()
On Error Resume Next
'in case ref exists
ActiveWorkbook.VBProject.References.AddFromGuid "{3F4DACA7-160D-11D2-A8E9-00104B365C9F}", 5, 5
End Sub


What is your concern about early binding and XP?

Cheers

Dave

mvidas
08-26-2004, 08:44 PM
Hey guys,

I don't have XP so I can't tell you for sure, but I was under the impression that when you have XP and you try and add a reference via VBA that there was an error. I thought something had to be added in the options to have VBA control the references like that. I'm just trying to make sure that the end user won't have to do anything special for this, other than just install the add-in.

Do either of you know if there is any validity to that?

BTW-I'll try the late binding tomorrow, glad to see it's as simple as just setting the RGX variable!

Matt

byundt
08-26-2004, 08:50 PM
Apparently, the RegExp.dll is installed as part of Microsoft Internet Explorer 5.0 and later. So the ubiquity of IE means the code ought to run for everyone.

Daniel Klann
08-26-2004, 11:51 PM
Hey guys,

I don't have XP so I can't tell you for sure, but I was under the impression that when you have XP and you try and add a reference via VBA that there was an error. I thought something had to be added in the options to have VBA control the references like that. I'm just trying to make sure that the end user won't have to do anything special for this, other than just install the add-in.

Do either of you know if there is any validity to that?

BTW-I'll try the late binding tomorrow, glad to see it's as simple as just setting the RGX variable!

Matt
G'day Matt,

You're correct. In order for this code to correctly add the reference the user would need to have ticked the 'Trust access to Visual Basic Project' option. This can be found in the 'Trusted Sources/Publishers' tab in the macro security dialog of XP/2003 (not sure about 2000 as I don't have it).

Regards,
Dan

mvidas
08-27-2004, 07:51 AM
Thanks Dan, after a bit more research it looks like its version 2002+ that has that option. It actually seems to run a few seconds faster with the late binding anyways, so I'm just going to keep it that way. When I'm done with this project, I hope you'll be around to test it on the newer version.

Everyone,
I put a counter in the macro, and am curious if the behavior I'm noticing is normal. With my 14600 line test file, it runs slower as it progresses. The first half of my file is processed within the first 30 seconds, and then its over a minute and a half for the second half. Looking more into it, it processes about 350 lines per second in the beginning, and as it progresses it processes less and less, ending at 32-33 lines per second near the end. I don't think it has anything to do with the Line Input lines, as without any manipulation it only takes a couple seconds to input the whole thing. I noticed this when I made a progress indicator, which only adds 2-3 seconds to the whole runtime. The text file itself is 11 columns x 14600 rows of =ADDRESS(ROW(),COLUMN(),4), copy/paste values, exported as a tab-delimited file.

Any ideas as to the slowdown, is this normal for regex?

Matt

brettdj
08-28-2004, 08:09 AM
Thats very interesting Matt, maybe you could send me the file for testing?

I haven't seen noticeable peformace degradation with RegExp before. I've used it to parse every formula on a large workbook and the similar sized worksheets took similar times

I assume nothing whacky is happening with free memory etc?

Cheers

Dave

mvidas
08-28-2004, 06:44 PM
Hi Dave,

It's possible, I had email the files to my home address yesterday, and just tried running them here at home. I got it done in 1:33, but it was similar (processed ~300 at first, ~60 at the end). I sent you an email with a zip file containing the text file im using and the unfinished add-in that has the macro in it. in the add-in is a module called "deleteme", and it's the ToTest subroutine in there. It will run a little bit faster without the progress bar, but its not a huge difference.

Thanks!
Matt