PDA

View Full Version : Solved: Delete 0 length strings



Paul_Hossler
02-04-2010, 06:26 PM
A lot of times on imported data, there is a 0 length string in cells.

I need to make sure that such 'empty appearing' cells are truly Empty.

I've been brute forcing my way through with something like this.


Option Explicit
Sub drv()
Dim rCell As Range, rData As Range
Set rData = Nothing
On Error Resume Next
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion.SpecialCells(xlCellTypeConstants, xlTextValues)
On Error GoTo 0

If Not rData Is Nothing Then
For Each rCell In rData.Cells
If Len(rCell.Value) = 0 Then rCell.ClearContents
Next
End If

End Sub




Is there a better (i.e. faster) way to avoid all that looping through possibly 5000 rows x 200 columns = 1M cells?


Paul

Oorang
02-04-2010, 07:59 PM
Hi Paul,
I see three opportunities for speed boost here.

The biggest one is removing the SpecialCells property. Invoking the special cells property will cause two things to happen. First the property will have to enumerate all the cells in the parent range and evaluate them for the criteria specified. Secondly the matching cells are then built into a complex range. While the property code is compiled and will therefore enumerate faster than a VBA enumeration, it can't enumerate all the cells and build the complex range faster. So it will generally be faster to just skip that part.

The next thing is to disable events, any time an individual cell is changed the worksheet changed event fires and the worksheet also recalculates. Turning off Events will prevent these two things and speed your macro.

Finally, enumerating using indexes is slightly faster than using For Each. The net result is:
Public Sub RemoveZLS()
Dim ws As Excel.Worksheet
Dim rDta As Excel.Range
Dim rCll As Excel.Range
Dim lRow As Long, lRowL As Long, lRowR As Long
Dim lCol As Long
Excel.Application.EnableEvents = False
Set ws = ActiveSheet
Set rDta = ws.UsedRange
With rDta
lRowL = .Row
lRowR = .Rows.Count - lRowL + 1&
For lCol = .Column To .Columns.Count - .Column + 1&
For lRow = lRowL To lRowR
Set rCll = ws.Cells(lRow, lCol)
If LenB(rCll.Value) = 0& Then
rCll.Value = Empty
End If
Next
Next
End With
Excel.Application.EnableEvents = True
End Sub

Paul_Hossler
02-05-2010, 02:13 PM
Aaron -- thanks. That was very interesting.

I would have thought that looping through SpecialCells ( ... ) would be less work ( = faster), but what you said makes sense.

Events are not a problem in the real app, since they're already not Enabled.

The index looping being faster than For Each is surprising.

Paul

Oorang
02-05-2010, 09:38 PM
Well, that's just general experience talking... So it might be wrong in certain scenarios etc. and so forth:) if it's something that is going to see serious use, I would create several test worksheets, time your original, then introduce each method above to see what difference (if any) it makes. The first one for example, will be sensitve to the data. The more complex a range it has to build the longer special cells will take (until it will eventually break if there are too many sub ranges).

Just remember premature optimization is the root of all evil. Don't spend 5 hours to squeeze out 2 more seconds. You will never see a return on that investment;) Here is a simple test harness have fun!
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long

Public Sub Test()
Dim lStart As Long
Dim lEnd As Long
lStart = GetTickCount
'Your procedure call here:
MyProcedure
lEnd = GetTickCount
MsgBox Format$((lEnd - lStart) / 1000&, "0.000 ""Seconds"""), vbInformation, "Time Elapsed"
End Sub

Public Sub MyProcedure()
Dim i As Long
For i = 1 To 100000000
Next
End Sub

Paul_Hossler
02-06-2010, 07:49 AM
Performance and 'not prematurely optimized' technique being dependent on the data makes sense.

In general, this comes in from an external system and might easily have 25,000 rows, and 100 columns = 2,500,000 (or more) cells

Some columns will be almost all 0 len strings, and some columns will be mostly non-0 len strings.

Thanks for the benchmarking code

Paul

Oorang
02-06-2010, 07:03 PM
Another approach that might be fast is this: Excel's sort feature is really fast. You could try this: Sort the first column so all the blanks are at the top and then walk down the column until you hit the first non-blank. Go to the next column and sort, then walk down to the first non-blank and so on. Because the data is sorted, the number of cells you have to check is greatly reduced.

It's a counter intuitive approach, but it exploits the fact that VBA is slow, while compiled code is fast. I suspect the time you lose by sorting will be offset by the time you gain by minimizing the cells you need to check. (No guarantees.)

I can't promise it will be faster, just something fun to check.

One drawback to this approach would be that it will not leave your data in place. I suspect that won't matter, but if it does, then don't do it:)

Paul_Hossler
02-07-2010, 08:32 PM
The only other approach I've managed to come up with is this over the range of interest



With ActiveSheet.UsedRange
.Replace What:="", Replacement:="true", LookAt:=xlWhole, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
.Replace What:="true", Replacement:="", LookAt:=xlWhole, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
End With



which assumes that there are no booleans in the data. Since this 'clean up' needs to be done because the data is imports, it's a fairly safe assumption

For some reason known only to Excel, the "" (emprty string) for the second .Replace leaves the cell Empty instead of putting a 0 length string in

Paul

Oorang
02-07-2010, 09:31 PM
Hmmm, that gives me an idea... This seems to work:
Activesheet.UsedRange.Value = Activesheet.UsedRange.Value
... Bet it runs faster too;)


Just be aware it will overwrite all your formulas. It's like doing copy/paste special values.

Paul_Hossler
02-08-2010, 05:15 PM
Yes, and yes. Only down side is that some of the strings consist of 16 - 20 numeric characters, and so they get rounded and displayed in scientic notation. But I can use your technique on a column by column basis for most of the data, and do a very few selected columns (those with the long character numbers) the slower way.

Definately an improvement

Thanks again

Paul