PDA

View Full Version : Solved: Faster way to Trim() ???



Paul_Hossler
02-21-2009, 08:27 AM
One of the users is loading 100K+ records from a mainframe.

I did a little macro with a For Each loop to go through the cells in the columns that could not have trailing spaces, since that would cause some VLookup()'s later to fail

It is taking a fairly long time to run through the 100K+ rows for each of the 3-4 columns

Question: is there a less brute force way to do something like this that would decrease the run time, hopefully significantly?

The 3 different things i've tried so far --


Option Explicit

Sub ForNextLoop()
Dim i As Long
Dim v As Variant
Dim rData As Range, rCell As Range

Set rData = ActiveSheet.Cells(1, 1).CurrentRegion.Columns(1)

For Each rCell In rData.Cells
rCell.Value = Trim(rCell.Value)
Next

Stop
End Sub

Sub UsingArray()
Dim i As Long
Dim v As Variant
Dim rData As Range
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion.Columns(1)

v = rData.Value
For i = LBound(v, 1) To UBound(v, 1)
v(i, 1) = Trim(v(i, 1))
Next i
rData.Value = v
Stop
End Sub

Sub UsingTempCol()
Dim i As Long
Dim v As Variant
Dim rData As Range, rTemp As Range, rCurrSelection As Range

Set rCurrSelection = Selection
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion.Columns(1)

'use col E as scratch col (probally use other WS)
Set rTemp = ActiveSheet.Cells(1, 100).Resize(rData.Rows.Count, rData.Columns.Count)

rTemp.FormulaArray = "=TRIM(" & rData.Address & ")"
rTemp.Parent.Calculate

Call rTemp.Copy

rData.Select

Selection.PasteSpecial (xlPasteValues)

rTemp.EntireColumn.Delete

rCurrSelection.Select
Stop
End Sub




Thanks

Paul

Kenneth Hobs
02-21-2009, 09:21 AM
You must be using Excel 2007 since 2003 only has 65k rows for the records.

For subs 1 and 2, be sure to add something like what I did in: http://vbaexpress.com/kb/getarticle.php?kb_id=1035 (http://vbaexpress.com/forum/../kb/getarticle.php?kb_id=1035)

Paul_Hossler
02-21-2009, 10:05 AM
Thanks Ken,

yes it's 2007

And yes I will. But it's still all that looping that takes time

The 3rd was is the best I could come up with, since it does not loop and it used more built-in Excel that VBA, but that's still a lot of records

Paul

Kenneth Hobs
02-21-2009, 11:29 AM
Not sure how you are getting your records. Maybe you can do it then?

AMontes
02-21-2009, 01:44 PM
Try this:


Sub trim()
With ActiveSheet.UsedRange
.Value = Evaluate("if(" & .Address & "<>"""",trim(" & .Address & "),"""")")
End With
End Sub

Paul_Hossler
02-21-2009, 05:31 PM
Well, that was interesting.

I created a test column of 100,000 records and ran the 4 different techniques with a Timer



Sub UsingEvaluate()
Dim i As Long
Dim v As Variant
Dim rData As Range, rTemp As Range, rCurrSelection As Range
Application.ScreenUpdating = False

Set rCurrSelection = Selection
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion.Columns(1)

Call TimerStart

With rData
.Value = Application.Evaluate("IF(" & .Address & "<>"""",TRIM(" & .Address & "),"""")")
End With

Call TimerEnd

rCurrSelection.Select
Application.ScreenUpdating = True
Call TimerShow
End Sub


Using Evaluate, .6520 sec

Using the Temp Col, .6352 sec

Using the VBA array, .2940 sec

Using the For/Next loop, 10.2206 sec

.Evaluate is about 16x faster that the For/Next.

I knew For/Next would be the slowest, but I was surprised by how slow it was.

The VBA array was the fastest, even though it also had a loop in it


Question: what is the purpose of the IF when you constructed the string to .Evaluate ?? It only seems to look for a non-empty address?

Thanks

Paul

Kenneth Hobs
02-21-2009, 05:47 PM
Welcome to the forum AMontes!

I also did some time tests too Paul. Did you check that AMontes method worked? Here is what I did but AMontes method in Sub aTrim did not work right.

I used the code below to average 5 runs for one column of 65536 rows. The s at end of time is with the speed routine and x without.

0.04s FillA (Fill column A before each run with "Test ")
0.04x FillA
0.03s sTrim
0.14x sTrim
0.75s ForNext
3.48x ForNext
0.25s UsingArray
0.25x UsingArray
0.37s UsingTempCol
0.40x UsingTempCol


Add these and the Subs first posted to test in a new workbook to a Module and play.
Sub TestTrimSubs()
Dim a, e, i As Integer

'Clear sheet2 to store test run Sub's stats
With Sheet2
.UsedRange.ClearContents
.Range("A1").Value = "Sub"
.Range("B1").Value = "Time (s)"
.Range("C1").Value = "Note"
End With

'Run the test subs and fill test run info
Sheet1.Activate
a = Array("sTrim", "UsingArray", "ForNextLoop", "UsingTempCol")
For i = 1 To 5
For Each e In a
SubStats "FillA"
SubStats CStr(e)
SpeedOn 'http://vbaexpress.com/kb/getarticle.php?kb_id=1035
SubStats "FillA", "SpeedOn"
SubStats CStr(e), "SpeedOn"
SpeedOff
Next e
Next i

'Sort and AutoFit Sheet2
With Sheet2
.UsedRange.Columns.AutoFit
.UsedRange.Sort Key1:=.Range("A2"), Order1:=xlAscending, _
Key2:=.Range("B2"), Order2:=xlAscending, _
Key3:=.Range("C2"), Order3:=xlAscending, _
Header:=xlGuess, OrderCustom:=1, MatchCase:=False, Orientation:= _
xlTopToBottom, DataOption1:=xlSortNormal, DataOption2:=xlSortNormal, _
DataOption3:=xlSortNormal
.Activate
End With
End Sub

Sub SubStats(sSub As String, Optional sNote As String = "")
Dim d As Double, dTime As Double, rRow As Long, r As Range
d = Timer
Application.Run (sSub)
dTime = Timer - d
With Sheet2
Set r = .Range("A" & Rows.Count).End(xlUp).Offset(1)
r.Value = sSub
rRow = r.Row
.Range("B" & rRow).Value = dTime
.Range("C" & rRow).Value = sNote
End With
End Sub


Sub FillA() 'Fill column A with a trailing space in all cells
Sheet1.Range("A:A").Value = "Test "
End Sub


Sub sTrim()
With ActiveSheet.UsedRange
.Value = Evaluate("if(" & .Address & "<>"""",trim(" & .Address & "),"""")")
End With
End Sub

Paul_Hossler
02-21-2009, 06:28 PM
Hi Ken -- very though job

I want/need to determine the most effecient way to go through a range, not only for TRIM() but I have to do similar things such as UCase, etc. With the million row 2007, people are using large sheets

I'm using 2007, and AMontes' .Evaluate method worked for me.

I did notice that .Evaluate does not seem to work for ranges with more than 1 area. I was experimenting with a WS OnChange event handler to Upper Case text, and I happend to have a multiple area range.

You had timing results, what problem did you have with AMontes?

Paul

Kenneth Hobs
02-21-2009, 06:49 PM
As you can see, I use 2003. The aTrim sub changed "Test " to #NUM!.

This method is the slowest yet. Dictionary's can be handy sometimes though.

0.96s Dict
1.08x Dict

Sub Dict()
Dim d As Object, c As Range, i As Long
Set d = CreateObject("Scripting.Dictionary")
With ActiveSheet
For Each c In .UsedRange
i = i + 1
d.Add i, trim(CStr(c))
Next c
.UsedRange.Value = d.items
End With
Set d = Nothing
End Sub
Another method might be to use ADO.

The worksheet Change event might be worth exploring. I am thinking that it would be similar to your ForNext method.

Paul_Hossler
02-22-2009, 08:07 AM
The WS change handler is pretty basic, but like I said, the .Evalalute does not seem to like multiple areas. But taking that into account, it seems to work fine.



Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)

Dim rData As Range, rTemp As Range, rArea As Range

Set rData = Nothing
On Error Resume Next
Set rData = Target.SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

If rData Is Nothing Then Exit Sub

Application.ScreenUpdating = False
Application.EnableEvents = False
For Each rArea In rData.Areas
With rArea
.Value = Application.Evaluate("IF(" & .Address & "<>"""",UPPER(" & .Address & "),"""")")
End With
Next

Application.EnableEvents = True
Application.ScreenUpdating = True

End Sub


Seems a little slower that the Range --> Array and Array --> Range, but does seem simpler, and by wrapping a Sub around it I can IMHO make it more portable




Sub ChangeValues(R As Range, Optional Operation As String = "UPPER")

Dim rData As Range, rTemp As Range, rArea As Range

Set rData = Nothing
On Error Resume Next
Set rData = R.SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

If rData Is Nothing Then Exit Sub

Application.ScreenUpdating = False
Application.EnableEvents = False
For Each rArea In rData.Areas
With rArea
.Value = Application.Evaluate("IF(" & .Address & "<>""""," & Operation & "(" & .Address & "),"""")")
End With
Next

Application.EnableEvents = True
Application.ScreenUpdating = True

End Sub

Sub UsingEvaluateSub()
Call TimerStart
Call ChangeValues(ActiveSheet.Cells(1, 1).CurrentRegion.Columns(1), "TRIM")

Call TimerEnd

Call TimerShow
End Sub



Paul

Kenneth Hobs
02-22-2009, 09:42 AM
Paul, I doubt that using the IF in the Evaluate method speeds anything. It might even slow it since it does one more step. I would have to run some tests to say for sure. Also, note that it returns #NUM! when the range is the full column. Some commands fail like that but few would ever use all rows in a column. It is usually not an issue other than when we code for full columns when it is not needed.

There is a limit on the number of areas but for your case using just a few areas, it should not be an issue.

Example to see the whole column problem using Evaluate.
Sub t()
Dim Operation As String, s As String
Operation = "Trim"

ActiveSheet.UsedRange.ClearContents

With Range("A1:A10")
.Value = " Test "
.Value = Application.Evaluate("IF(" & .Address & "<>""""," & Operation & "(" & .Address & "),"""")")
End With
With Range("B1:B10")
.Value = " Test "
.Value = Application.Evaluate(Operation & "(" & .Address & ")")
End With

'Result of #NUM! since whole column was set
With Range("C:C")
.Value = " Test "
.Value = Application.Evaluate("IF(" & .Address & "<>""""," & Operation & "(" & .Address & "),"""")")
End With
With Range("D:D")
.Value = " Test "
.Value = Application.Evaluate(Operation & "(" & .Address & ")")
End With

'Ok since full column was not set
s = "E1:E" & Rows.Count - 1
With Range(s)
.Value = " Test "
.Value = Application.Evaluate("IF(" & .Address & "<>""""," & Operation & "(" & .Address & "),"""")")
End With
s = "F1:F" & Rows.Count - 1
With Range(s)
.Value = " Test "
.Value = Application.Evaluate(Operation & "(" & .Address & ")")
End With

'Result of #NUM! since whole column was set
With Columns("G")
.Value = " Test "
.Value = Application.Evaluate("IF(" & .Address & "<>""""," & Operation & "(" & .Address & "),"""")")
End With
With Columns("H")
.Value = " Test "
.Value = Application.Evaluate(Operation & "(" & .Address & ")")
End With
End Sub

Paul_Hossler
02-22-2009, 10:51 AM
Must be a difference between 2003 and 2007 in the way evaluate works

Sub UsingEvaluate fills Col A with 1,0485,576 cells and then does the UPPER (I got tired of looking for trailing spaces:rotlaugh:)

I deleted the IF and the time dropped from 5.70 sec to just 1.91 secs


Sub UsingEvaluate2 uses the Selection, and if I select multiple ranges, I get a #VALUE, a selection wth one area works fine (basically the first case)


Paul



Sub UsingEvaluate()

Dim rData As Range
Application.ScreenUpdating = False

Set rData = ActiveSheet.Columns(1)

rData.Value = "asdfasdfasdf"

MsgBox Format(rData.Cells.Count, "#,###")

Call TimerStart

With rData
' .Value = Application.Evaluate("IF(" & .Address & "<>"""",UPPER(" & .Address & "),"""")")
.Value = Application.Evaluate("UPPER(" & .Address & ")")
End With
Call TimerEnd

Application.ScreenUpdating = True
Call TimerShow
End Sub

Sub UsingEvaluate2()

Dim rData As Range
Application.ScreenUpdating = False

Set rData = Selection

rData.Value = "asdfasdfasdf"

MsgBox Format(rData.Cells.Count, "#,###")



Call TimerStart

With rData
' .Value = Application.Evaluate("IF(" & .Address & "<>"""",UPPER(" & .Address & "),"""")")
.Value = Application.Evaluate("UPPER(" & .Address & ")")
End With
Call TimerEnd

Application.ScreenUpdating = True
Call TimerShow
End Sub

GTO
02-22-2009, 11:10 PM
Greetings Paul,

RE:


I deleted the IF and the time dropped from 5.70 sec to just 1.91 secs

Greetings Paul,

If I understand your comments, you are saying that (while using Excel2007) the Evaluate statement will work on the entire column {like: Range("A:A")}? If so, this must be different, as I was following the thread yesterday and had noted where Keneth's examples had a slight glitch in using Evaluate.

I had noted similar to Keneth's observations that (for instance) I could use evaluate and get a good array of returning values if the range was A1:A65535 or less. Conversely, setting the range to either ("A1:A65536") or ("A:A") would return the error.

I realize that part has been discussed, but I noted something else today. All of the testing and conversation had been based on testing with an array of cells, where said cell array was initially assigned one value - be it a string with a space, string in lowercase, or whatever.

Now while maybe a silly exercise (after all, how many times are you going to need to trim an entire column), I had already been figuring how to split the column if the range included the entire column and one was using pre-2007. Anyways, I started testing with two and three column ranges just to see runtime length changes.

Ahem... "We have a problem Houston..."

At first, I thought I must be doing something wrong, when I was also trying to test Evaluate w/o the IF. So - I tested against of your latest snippets (modified slightly).

Sub UsingEvaluate2_Test_Without_IF()

Dim rData As Range
Application.ScreenUpdating = False

Set rData = Range("A1:A10")

rData.Value = "asdfasdfasdf"

Set rData = Range("A11:A20")

rData.Value = "SomethingElse"

Set rData = ActiveSheet.UsedRange

MsgBox Format(rData.Cells.Count, "#,###")

' Call TimerStart

With rData
' .Value = Application.Evaluate("IF(" & .Address & "<>"""",UPPER(" & .Address & "),"""")")
.Value = Application.Evaluate("UPPER(" & .Address & ")")
End With
'Call TimerEnd

Application.ScreenUpdating = True
' Call TimerShow
End Sub
If you step thru it, you will see that both strings are initially set correctly, but when Evaluate executes, well... they all go to the first string's value :( (allbeit uppercase).

Bummer, eh? Now hopefully AMontes or anyone with a better grasp on Evaluate than I can correct me or better explain this, but basically my current belief is that with the inclusion of the IF test, Evaluate is returning an array of values, based on an array of IF tests, returning the value from the correct element of the same array listed in the Trim.

But I think that when trying to use Evaluate with only the Trim, the result is just from the first cell, as I do not believe Trim can return an array itself.

I could not find away around this myself, but I may well have missed a trick.

Hope this helps,

Mark

Kenneth Hobs
02-23-2009, 11:33 AM
Glad you piped in GTO. I found the same problem with the Evaluate method using the adjusted value in the first cell for the whole range.

So, it appears that the Array method is the fastest.

I was thinking of another route using R but I doubt that it would be faster than the Array method. Plus, installing R on everyone's computer might be a problem for some.

PS The forum has not been sending Instant Email Notifications to me so I might miss a reply every so often.

mdmackillop
02-23-2009, 11:53 AM
Hi Kenneth
Click on Search/Subscribed threads to see what's been reponded to.
Regards
Malcolm

GTO
02-23-2009, 02:49 PM
@Kenneth:

Goodness, my apologies for the mis-spells :-( sorry about that!

BTW, while certainly not "prize winning" or anything, here is what I came up with to avoid the "if the used range is the full column" issue...

Sub QuickTest()
Dim _
rngUsed As Range, _
rngOne As Range, _
rngTwo As Range

Set rngUsed = ActiveSheet.UsedRange

With rngUsed

If rngUsed.Rows.Count = Rows.Count _
Or rngUsed.Row + rngUsed.Rows.Count = Rows.Count Then

'Set rngOne = .Range(.Cells(.Row, .Column), _
.Cells(Rows.Count \ 2, .Column + .Columns.Count - 1))
Set rngOne = .Resize(Rows.Count \ 2)
Set rngTwo = .Range(.Cells((Rows.Count \ 2) + 1, .Column), _
.Cells(.Row + .Rows.Count - 1, _
.Column + .Columns.Count - 1))

With rngOne
.Value = Evaluate("IF(" & .Address & "<>"""",TRIM(" & .Address & "),"""")")
End With

With rngTwo
.Value = Evaluate("IF(" & .Address & "<>"""",TRIM(" & .Address & "),"""")")
End With
Else

.Value = Evaluate("IF(" & .Address & "<>"""",TRIM(" & .Address & "),"""")")

End If


End With
End Sub

As noted, certainly not 'inspired', but seems to work.

A great day to you both,

Mark

Paul_Hossler
02-23-2009, 06:41 PM
Ken / Mark

Thanks for your interest in something that I thought would be a pretty streight-forward question



Bummer, eh? Now hopefully AMontes or anyone with a better grasp on Evaluate than I can correct me or better explain this, but basically my current belief is that with the inclusion of the IF test, Evaluate is returning an array of values, based on an array of IF tests, returning the value from the correct element of the same array listed in the Trim.


That would agree with the results. I never tried .Evaluate using 2003 except for a few individual cells at a time. 2007 is different.



Sub UsingEvaluate3()

Dim rData As Range
Application.ScreenUpdating = False

ActiveSheet.Columns(1).Value = "aaaa"
ActiveSheet.Columns(2).Value = "bbbb"
ActiveSheet.Columns(3).Value = "cccc"
ActiveSheet.Columns(4).Value = "dddd"

ActiveSheet.Cells(2, 1).Value = "not aaaa"


Set rData = ActiveSheet.Cells(1, 1).CurrentRegion

MsgBox Format(rData.Cells.Count, "#,###")

Call TimerStart

With rData

'leaving the IF in does it correctly. but takes longer
.Value = Application.Evaluate("IF(" & .Address & "<>"""",UPPER(" & .Address & "),"""")")

'this will fill the range wth the same value from A1
' .Value = Application.Evaluate("UPPER(" & .Address & ")")
End With
Call TimerEnd

Application.ScreenUpdating = True
Call TimerShow
End Sub




I had noticed the "fill everything with the first value" but tried to work around it by only doing 1 column at a time. The correllary to this is that when I did a multiple select and tried to TRIM the selection, I got #VALUE errors. I had to loop Selection.Areas each time


Sub UsingEvaluate5()

Dim rData As Range, rArea As Range
Application.ScreenUpdating = False


'control-click 3 areas

Set rData = Selection

rData.Areas(1).Value = "aaaaa"
rData.Areas(2).Value = "bbbbb"
rData.Areas(3).Value = "cccca"

MsgBox Format(rData.Cells.Count, "#,###")

Call TimerStart

For Each rArea In rData.Areas
With rArea

'leaving the IF in does it correctly. but takes longer (has the advantage of being correct)
' .Value = Application.Evaluate("IF(" & .Address & "<>"""",UPPER(" & .Address & "),"""")")

'this will fill the range wth the same value from A1
.Value = Application.Evaluate("UPPER(" & .Address & ")")
End With
Next
Call TimerEnd

Application.ScreenUpdating = True
Call TimerShow
End Sub



Not too sure what this all means, but now that I understand a little bit better what's happening,

If there's one column, I can leave the IF off and gain some speed
If more that one column, then I can do one column at a time IF-less, or leave the IF on and do all at once
If more that one area, then I can loop through the .Areas


Paul

GTO
02-23-2009, 09:11 PM
Hi Paul,

RE:

If there's one column, I can leave the IF off and gain some speed

Nooooo!

Leastwise not according to what I found, and specifically to the issue for your wb, which is trimmimg spaces.

Even if its just one column, leaving th IF out results in all cells in the range being assigned the (trimmed) value of the first cell. Let's say:

A1="12345 "
A2="23456 "
A3="34567 "

After the Evaluate W/O the IF, I believe you will have:

A1="12345"
A2="12345"
A3="12345"

Try the snippet at (if I am counting posts correctly) #13 to see if you agree.

Mark

Paul_Hossler
02-24-2009, 05:46 AM
Nooooo!


Yep, I agree -- I was getting tangled up in my test cases, as well as the apparent differences between 2003 and 2007

I suspect I would have discovered my error as soon as the users started complaining about missing data :blush

Paul