PDA

View Full Version : Solved: Replace without looping??



Paul_Hossler
11-05-2011, 07:24 AM
Doing a 'data mining' project at work in Excel 2007, almost 1M lines

I've got a whole series of macros to do specific tasks

One seems to take a long time (or at longer than I think it should)


Sub Rows_Delete_If_Prefix(sPrefix As String, nCol As Long)
Dim rData As Range, rCell As Range

Application.ScreenUpdating = False
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion
Set rData = Intersect(rData, rData.Parent.Columns(nCol))

For Each rCell In rData.Cells
If UCase(Left(rCell.Value, Len(sPrefix))) = UCase(sPrefix) Then rCell.Value = True
Next

' this just sorts by nCol to group True, and uses
' ActiveSheet.Columns(nCol).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
' to delete one large Area
' Call Rows_Delete_If_TRUE(nCol)

Application.StatusBar = False
Application.ScreenUpdating = True

End Sub


I can do a .Replace on a entire column, but if I only want to replace the value with True if the string if it occurs at the beginning of the Value, I loop through the entire column one cell at a time, test each value, and then 'mark' that row so I can delete them all as a single group

I did think about RegEx (plenty of good example here), but the all seem to do it one cell at a time.

I was looking for a fast way to get them all at once ... if possible

Paul

mikerickson
11-05-2011, 08:08 AM
Have you tried Replace using wildcards
Range("B:B").Replace What:="cat*", Replacement:="true", LookAt:=xlWhole, _
SearchOrder:=xlByRows, MatchCase:=False

Paul_Hossler
11-05-2011, 02:02 PM
Have you tried Replace using wildcards

Obviously not :doh:

I NEVER thought about doing it that way, or even trying it :banghead: :banghead: :banghead: :banghead:

(That's a '4-banger')
Thanks for leading me back to the path of good VBA :beerchug:

That will make a lot of things easier and faster

Now ... is there a similar way to make a whole range Upper Case??

Paul

Aflatoon
11-07-2011, 05:46 AM
You can use Evaluate:
With ActiveSheet.Cells(1, 1).CurrentRegion
.Value = Evaluate("INDEX(UPPER(" & .Address & "),0,0)")
for example. No idea if that will actually be faster than dropping into an array, converting, then dropping back into the range.

Paul_Hossler
11-07-2011, 08:30 AM
You can use Evaluate:


Thanks -- I'll give it a shot

The latest partial 'dump' of data to mine thorough has 990K+ lines, and that only because we had to do each year seperately :(

Anything to speed up cranking through the data in this monster project is welcome

Paul

GTO
11-07-2011, 09:51 AM
...
Now ... is there a similar way to make a whole range Upper Case??

Paul

Hi Paul,

I am probably not grasping the obvious and do not wish to slow you down, so just if convenient/easy:

Is the "whole range" part or whole of just the one column we were replacing partials in, or, part or whole of usedrange?

Are we UCASE(ing) just for looks, or to another end?

Mark

Paul_Hossler
11-07-2011, 05:06 PM
Hi Mark

I have something like A2:F900000 with mostly text data that comes in from an external system.


To another end (mostly),

1. There's a lot of downstream processing, and it'll be a lot easier if everything is in UC for testing and things.

2. The end result would look better to management (and we all know how important that is) if the data were not UPPER CASE, Mixed Case, and the ocassional lower case all jumbled

Right now I brute force my way down the rows in a (long running) loop with

For Each .....
rCell.Value = UCase (rCell.value)
Next


Aflatoon has an interesting idea. It sounded familiar some how, so maybe I read about it here

Paul

Paul_Hossler
11-07-2011, 07:49 PM
Follow up

I did little timing test using .Evaluate and using the For loop


Option Explicit
Sub test()
Dim sFormula As String
Dim rData As Range
Dim t1 As Single, t2 As Single
Dim i As Long


ActiveSheet.Columns(5).Value = "aaaa"

Set rData = ActiveSheet.Cells(1, 5).CurrentRegion

sFormula = "=UPPER(" & rData.Address & ")"

t1 = Timer
rData.Value = Application.Evaluate(sFormula)
t2 = Timer
MsgBox (t2 - t1)

rData.Value = "aaaaa"

t1 = Timer
For i = 1 To rData.Cells.Count
rData.Cells(i).Value = UCase(rData.Cells(i).Value)
Next i
t2 = Timer

MsgBox (t2 - t1)

End Sub


I figured the loop would be slow, but 245 sec vs 1.6?!?! That's impressive

And if I use .Evaluate I can probably speed up some things other than just UCase

Thanks

Paul

mikerickson
11-08-2011, 07:40 AM
Using bulk read/write to cells, this loop takes only 8.36 sec (vs. 2.26 for Evaluate)
Dim outArray as Variant
t1 = Timer
outArray = rData.Value
For i = 1 To rData.Cells.Count
outArray(i, 1) = UCase(outArray(i, 1))
Next i
rData.Value = outArray
t2 = Timer
MsgBox (t2 - t1)

GTO
11-08-2011, 09:41 AM
I only got to run twice on a good PC (hectic night), but using the string "aaaa" (just shoved over to Col A), I got:

Evaluate:
0.390625
0.328125

Plunk array/loop/plunk range:
0.203125
0.21875

test code:


Option Explicit

Sub UCaseTheWorks()
Dim rData As Range
Dim StartHack As Double, EndHack As Double
Set rData = ActiveSheet.Range("A1").CurrentRegion

StartHack = Timer
fix1 rData
EndHack = Timer

Debug.Print EndHack - StartHack
End Sub

Sub fix2(r As Range)
Dim x As Long, y As Long
Dim ary

ary = r.Value
For x = 1 To UBound(ary, 1)
For y = 1 To UBound(ary, 2)
ary(x, y) = UCase(ary(x, y))
Next
Next
r.Value = ary
End Sub

Sub fix1(r As Range)
With r
.Value = Evaluate("INDEX(UPPER(" & .Address(0, 0, , -1) & "),0,0)")
End With
End Sub

Aflatoon
11-09-2011, 01:42 AM
Try using:
r.worksheet.evaluate rather than application.evaluate - it is usually considerably faster.

GTO
11-09-2011, 04:30 AM
Hi Aflatoon,

On the same PC, file on flash, I ran ea of the below, with the indicated results.


Option Explicit

Sub UCaseTheWorks()
Dim rData As Range
Dim StartHack As Double, EndHack As Double

Cells(1, 1).Resize(Rows.Count).Value = "aaaa"
Set rData = ActiveSheet.Range("A1").CurrentRegion
DoEvents

StartHack = Timer
fix3 rData
EndHack = Timer

Debug.Print EndHack - StartHack

End Sub

Sub fix3(r As Range)
With r
'.Value = .Parent.Evaluate("INDEX(UPPER(" & .Address(0, 0, , -1) & "),0,0)")
.Value = .Worksheet.Evaluate("INDEX(UPPER(" & .Address(0, 0, , -1) & "),0,0)")
End With
End Sub

Sub fix1(r As Range)
With r
.Value = Evaluate("INDEX(UPPER(" & .Address(0, 0, , -1) & "),0,0)")
End With
End Sub

Sub fix2(r As Range)
Dim x As Long, y As Long
Dim ary

ary = r.Value
For x = 1 To UBound(ary, 1)
For y = 1 To UBound(ary, 2)
ary(x, y) = UCase(ary(x, y))
Next
Next
r.Value = ary
End Sub


...so thus far at least, it would seem to me that looping an array still edges out.

Aflatoon
11-09-2011, 05:11 AM
The array being faster does not really surprise me, but the performance of Evaluate versus Worksheet.Evaluate does (especially after reading some of Charles Williams' recent blogs). I confess I tend to use Worksheet.Evaluate in preference anyway as it saves worrying about including sheet names or which sheet is active.
However, unless I were needing to use worksheet functions for calculation, I would tend to use the array and loop, not least because it tends to be more legible.

Paul_Hossler
11-09-2011, 06:27 AM
However, unless I were needing to use worksheet functions for calculation, I would tend to use the array and loop, not least because it tends to be more legible.


Are there memory impacts from Dim-ing a 900,000 element variant array in order to loop the array?

I do like the suggestion to use



r.worksheet.evaluate


Paul

Paul_Hossler
11-09-2011, 09:44 AM
I just noticed that I did something REALLY dumb when I was testing:banghead:

I used "aaaa" as a single test value and it appeared to work OK, but with real data, all I did was copy the upper case of the first value to all the cells.:banghead: :banghead: Double dumb


This was when I realized my mistake


.Worksheet.Evaluate("INDEX(UPPER(" & .Address(0, 0, , -1) & "),0,0)")


:beerchug: to both of you

Paul