PDA

View Full Version : Parse SAS LOG using RegExp



xluser2007
05-16-2009, 01:53 AM
Hi All,

In a recent thread (http://www.vbaexpress.com/forum/showthread.php?t=26696) (which is still being used), VBAXer p45cal helped me to IMPORT a SAS LOG into Excel.

Although there are still some finer details still refined in that thread, I have a parallel question which concerns the parsing of the SAS LOG for various key information.

I have attached a sample workbook of an IMPORTED SAS log. In it I have have 2 worksheets:
"Regex Patterns" which contain various Regex patterns that are required to parse the LOG in a named range called "RegexPatternTable", and;
"IMPORTLOG" - which contains a named range "WholeLogRng" which is effectively the range to parse.In "IMPORTLOG" I have in columns C, D and E the output ranges for the SAS LOG parsing.

Basically:

In column C, I have a list of the Key Parsing word that i have stored in the range "RegexPatternTable".

I would like to parse the LOG using the keyword in ColumnC and the corresponding Regex pattern in "RegexPatternTable" to do the following:


In column D, I would like to output the number of times that Keyword i.e. Regex pattern for the word, is matched in the LOG.
In column E, I would like to copy and paste all the examples of the regex match i.e. should go down by the number of matches noted in column E. As such this column dynamically changes with the number of matches found, so not sure how to keep inserting the matched line examples here).Any help is appreciated.

Kind regards

Bob Phillips
05-16-2009, 02:53 AM
This should do the first part, not really sure what you mean by the second.

Private RegEx As Object

Public Sub ProcessData()
Dim i As Long, j As Long
Dim LastRow As Long
Dim LastData As Long
Dim NextPattern As String
Dim shReg As Worksheet
Dim tmp As Long

Set RegEx = CreateObject("VBScript.RegExp")

Set shReg = Worksheets("Regex Patterns")

With Worksheets("IMPORTLOG")

LastData = .Cells(.Rows.Count, "A").End(xlUp).Row
LastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
For i = 2 To LastRow

If .Cells(i, "C").Value <> "" Then

NextPattern = shReg.Cells(Application.Match(Cells(i, "C").Value, shReg.Columns("A"), 0), "B").Value
tmp = 0
For j = 2 To LastData

tmp = tmp - RegexMatch(.Cells(j, "A").Value, NextPattern)
Next j

.Cells(i, "D").Value = tmp
End If
Next i

End With

Set RegEx = Nothing
End Sub

Function RegexMatch(ByVal InText As Variant, _
ByVal Pattern As Variant) As Boolean
RegEx.Pattern = Pattern
On Error Resume Next
RegexMatch = RegEx.Test(InText)
End Function

xluser2007
05-16-2009, 03:43 AM
Hi Bob,

Fantastic code, It does the the job of counting, but the Regex patterns may have an error in them, as some of the output doesn't quite reconcile. I sourced my Regex patterns form here (http://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/0bd5dc8639cca3fa?pli=1) (they are in XML form for C#, but I extracted the relevant patterns which should work for VBA shouldn't they?),

I believe there is a problem with the Regex pattern for ERRORS and warnings, as they are showing up as 0, but there are clearly 5 "ERROR:" terms visible from a using a sinple Find, for example. Any ideas why ther aren't matching, row 34 and 46 are examples?

Also, as another example, for the INFILE keyword, I counted the following matches:

Rows Examples
83 NOTE: There were 1 observations read from the data set WORK.TEMP.
111 NOTE: There were 1 observations read from the data set
WORK.TEMP.
177 NOTE: There were 3 observations read from the data set
241 NOTE: There were 3 observations read from the data set WORK.AGE.

As for what i meant in the second part was that once we have a count of the Regex matches, I would like to list the matches in column E (this part is new), i would like to add a row number of each match, and in Column F I would like to set paste the relevant line (or set the value) next to the corresponding row in column E (I've attached a sample workbook below to show what I mean).

In some ways this will test how the patterns are matching, and will show each specific line that is being matched. As the log can vary, so can the number of matches, and the number of lines o match and insert for each keyword.

As such all of the rows need to be populated in one macro. You have already elegantly showed how to do the first part, i was wondering if you could help extend it to do this part.

many thanks as always.

xluser2007
05-16-2009, 04:38 PM
Hi Bob,

Did you get a chance to test the code, with the revised workbook in the previous post.

Any ideas why some of the Regex's may not be matching?

kind regards,

Kenneth Hobs
05-18-2009, 10:42 AM
Your match problem is due to your patterns. Try something like:
^WARNING:\s*
^ERROR:\s*

xluser2007
05-18-2009, 05:12 PM
Hi Ken,

may thanks for your reply.

I've tested and it appears to count properly now.

As such, are you able to assist me in the second part of my query, where the counted terms are listed dynamically (as the number of terms will vary with each different log) This is as as shown in attached in columns E and F?

This will aslo be a direct test of how the Regex patterns are matching,

Any help appreciated.

Kenneth Hobs
05-19-2009, 05:44 AM
After reviewing your details for step 2, it appears that you may need to consider some issues. As you may have noticed, Bob's code finds 3 for the first match. Your manual markup for step 2, shows 4. This is because row 177 is a partial match. There are some other cases where your SAS log is wrapping the lines.

We can modify Bob's code to include your step 2. Its first task would be to delete the data from C2:Fx where x is the last row with data in any of those columns.

Before I give you a solution for steps 1 and 2 in one macro, we need to resolve the partial match issue that I explained in the first paragraph. You may want to add the wild card as I did earlier to get the partial matches if you can't fix the import. The code for the step 2 part would then need to check the next row in the match and if it has a prefix of 7 space characters then trim 6 and concatenate with the row above for your column F cells. Of course it would be easier if you can fix your log input in the import phase but if not, we can use the method that I just explained.

I guess you know that the filter for Infile=InputDataSet and File=OutputDataSet.

Kenneth Hobs
05-19-2009, 06:10 PM
I had some time so I did an example using the assumptions that I discussed earlier. Comment out the SpeedOn and SpeedOff lines if you don't want to install the Module in the link. As always, backup your file before trying this.

Public Sub ProcessData()
Dim i As Long, j As Long
Dim LastRow As Long
Dim LastData As Long
Dim NextPattern As String
Dim shReg As Worksheet
Dim tmp As Long
Dim tf As Boolean
Dim aRange As Range
Dim nr As Long
Dim r As Range
Dim aString As String

On Error GoTo EndNow
SpeedOn 'Add module from http://vbaexpress.com/kb/getarticle.php?kb_id=1035
Set RegEx = CreateObject("VBScript.RegExp")
Set shReg = Worksheets("Regex Patterns")

With Worksheets("IMPORTLOG")
'Delete data if it exists to prepare for new data
.Range("C2", .Range("C" & .Rows.Count).End(xlUp)).ClearContents
.Range("D2", .Range("D" & .Rows.Count).End(xlUp)).ClearContents
.Range("E2", .Range("E" & .Rows.Count).End(xlUp)).ClearContents
.Range("F2", .Range("F" & .Rows.Count).End(xlUp)).ClearContents
LastData = .Cells(.Rows.Count, "A").End(xlUp).Row
For i = 2 To shReg.Range("B1").End(xlDown).Row
NextPattern = shReg.Range("B" & i).Value
nr = .Range("F" & .Rows.Count).End(xlUp).Row + 1
.Range("C" & nr).Value = shReg.Range("A" & i).Value
tmp = 0
For j = 2 To LastData
Set aRange = .Cells(j, "A")
tf = RegexMatch(aRange.Value, NextPattern)
If tf = True Then
tmp = tmp + 1
aString = aRange.Value
If Left(aRange.Offset(1, 0).Value, 7) = " " Then
aString = aString & " " & Trim(aRange.Offset(1, 0).Value)
End If
.Range("E" & (nr + tmp - 1)).Value = aRange.Row
.Range("F" & (nr + tmp - 1)).Value = aString
End If
Next j
.Cells(nr, "D").Value = tmp
Next i
End With

EndNow:
On Error Resume Next
SpeedOff
Set RegEx = Nothing
End Sub

xluser2007
05-20-2009, 06:52 PM
Hi Ken,

Many thanks for your great code, really does work well in the example provided :thumb.

I've been trying to write back sooner, but kept getting an "Adminstrator" error when trying to logon at various times. possibl;y due to maintenance done by the VBAX administrators.

I particulalrly liked your new Regex patterns that capture the ERROR: and WARNING better. The Regex patterns were sourced from the site mentioned above, but as you have demonstrated they need more tweaking for parsing-matching purposes.

I am going to experiment with the great efforst you have provided and see new ways to build on it.

The hardest part of the SAS LOG parsing yourself and Bob have actually accomplished :clap:.

The next aprt will likely be to highlight the relevant matched pattern in the imported log, based on a simple selection_change event based on selecting the relevant Regex match in column F. But I will post back with any queries with regards to this.

Thanks again.

Kenneth Hobs
05-21-2009, 07:00 AM
Not sure what you mean by highlight. In this example, if you select any cell in columns E or F, it will set the interior.colorindex to red for the row number in that cell for column A and set all other cells in column A to the automatic interior.colorindex.

Right click your IMPORTLOG sheet, View Code, and paste:
Private Sub Worksheet_SelectionChange(ByVal Target As Range)
Range("A1", Range("A" & Rows.Count).End(xlUp)).Interior.ColorIndex = xlColorIndexAutomatic
If Target.Column < 5 Or Target.Column > 6 Then Exit Sub
Range("A" & Range("E" & Target.Row).Value).Interior.ColorIndex = xlcired
End Sub

xluser2007
05-22-2009, 08:38 PM
HI Ken,

That's pretty much what I was thinking about - great stuff!

I'm going to test this out with otehr LOGS and let you know how it goes. Looks promising though as the Frront End for this internal use tool is now largely built thanks to your efforts.

Will post back with with any additional features.

Thanks