PDA

View Full Version : Solved: Duplicates removals!



james123
01-11-2007, 03:21 PM
Hiya,

Not even sure if this is possible, Basically I have a spreadsheet with 48 columns, and over 30000 rows.

The rows of data do have duplicates in them but some of the duplicate rows have more information then the others. I want to get rid of the rows with the least information.

So basically is it possible to make a macro that finds duplicates and compares them, deleting the row with the least information?
I have included a small sample spreadsheet same heading etc but no real data (and only 5 rows). The column that it needs to check for similarities is "Q" named "REG NR"

Please let me know if this is possible and how I would go about doing something like this?

Thanks guys.

Ken Puls
01-11-2007, 04:15 PM
I have not looked at your data, but...

What I tend to do when I need to delete duplicates is use a variation of this routine from my site (http://www.excelguru.ca/node/24).

It works like this:


A column of index numbers is created after the last column of information in the spreadsheet. This is to preserve the original order.
The data is then sorted based on the specified column, with the secondary sort key being the original order. (This ensures that the first instance of a record will be maintained.)
A column of formulas is inserted at the end of the sheet, comparing the Target Column's data for that row against the previous row. If it does not match, it is a new record, if it does match, then the record is a duplicate.
Using the autofilter, all data that has been flagged as a duplicate is deleted.
The data is sorted back into the original order and the "helper columns" are deletedThe key for you would be to make that second formula robust enough to check if the row is a duplicate, and if so, is is the shortest of the bunch. (And then, of course, implement that formula in the VBE.)

Does that make sense?

mdmackillop
01-11-2007, 05:50 PM
Hi James
Here's some code to try. Tried to minimise looping 30000 rows, but couldn't avoid it completely!
Sub DelDups()
Dim Rw As Long, Col As Range
Application.ScreenUpdating = False
Rw = Cells(Rows.Count, 1).End(xlUp).Row
Set Col = Range("AX2:AX" & Rw)

'Titles
[AX1] = "Index"
[AY1] = "Registration"
[AZ1] = "Count"


'Create index
Col.Formula = "=ROW()"
Col.Copy
Col.PasteSpecial xlPasteValues

'Copy registration
Set Col = Col.Offset(, 1)
Col.FormulaR1C1 = "=RC[-34]"
Col.Copy
Col.PasteSpecial xlPasteValues

'Count data cells
Set Col = Col.Offset(, 1)
Col.FormulaR1C1 = "=COUNTA(RC[-51]:RC[-4])"
Col.Copy
Col.PasteSpecial xlPasteValues

'Sort Data
Columns("A:AZ").Select
Selection.Sort Key1:=Range("AY2"), Order1:=xlAscending, Key2:=Range("AZ2" _
), Order2:=xlDescending, Header:=xlYes, OrderCustom:=1, MatchCase:=False _
, Orientation:=xlTopToBottom, DataOption1:=xlSortNormal, DataOption2:= _
xlSortNormal
[AX1].Select

'Clear dup. count cells
For i = 2 To Rw
If Range("AY" & i) = Range("AY" & i - 1) Then Range("AZ" & i).ClearContents
Next

'Filter for blanks
With Columns("A:AZ")
.AutoFilter Field:=52, Criteria1:="="
Rows("2:" & Rw).ClearContents
.AutoFilter
End With
'Restore order
Columns("A:AZ").Sort Key1:=Range("AX2"), Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal

[AX1].Select
Application.ScreenUpdating = True
End Sub

james123
01-11-2007, 06:28 PM
Stops with 1004 error

"aplication-defined or object-defined error" debug highlights this code:

Selection.Sort Key1:=Range("AY2"), Order1:=xlAscending, Key2:=Range("AZ2" _
), Order2:=xlDescending, Header:=xlYes, OrderCustom:=1, MatchCase:=False _
, Orientation:=xlTopToBottom, DataOption1:=xlSortNormal, DataOption2:= _
xlSortNormal

Does this on both the one i downloaded from your post and the real sheet.

Ken Puls
01-11-2007, 06:36 PM
HI James,

What version of Excel are you running? Maybe try trimming out some of the (most likely) unnecessary arguments:

Selection.Sort Key1:=Range("AY2"), Order1:=xlAscending, Key2:=Range("AZ2"), _
Order2:=xlDescending, Header:=xlYes, Orientation:=xlTopToBottom

mdmackillop
01-11-2007, 06:42 PM
Sorry James,
I usually remember to delete these options which cause problems in pre 2003 Excel

james123
01-11-2007, 06:50 PM
yer im using 2000 :( bit rubish i know!

Can you repost the code with the bits taken out. Think i may be trying to cut out too much as not work at all now! :S

mdmackillop
01-11-2007, 06:55 PM
Modified for Office 2000
Sub DelDups()
Dim Rw As Long, Col As Range
Application.ScreenUpdating = False
Rw = Cells(Rows.Count, 1).End(xlUp).Row
Set Col = Range("AX2:AX" & Rw)

'Titles
[AX1] = "Index"
[AY1] = "Registration"
[AZ1] = "Count"


'Create index
Col.Formula = "=ROW()"
Col.Copy
Col.PasteSpecial xlPasteValues

'Copy registration
Set Col = Col.Offset(, 1)
Col.FormulaR1C1 = "=RC[-34]"
Col.Copy
Col.PasteSpecial xlPasteValues

'Count data cells
Set Col = Col.Offset(, 1)
Col.FormulaR1C1 = "=COUNTA(RC[-51]:RC[-4])"
Col.Copy
Col.PasteSpecial xlPasteValues

'Sort Data
Columns("A:AZ").Select
Selection.Sort Key1:=Range("AY2"), Order1:=xlAscending, Key2:=Range("AZ2" _
), Order2:=xlDescending, Header:=xlYes, OrderCustom:=1, MatchCase:=False _
, Orientation:=xlTopToBottom
[AX1].Select

'Clear dup. count cells
For i = 2 To Rw
If Range("AY" & i) = Range("AY" & i - 1) Then Range("AZ" & i).ClearContents
Next

'Filter for blanks
With Columns("A:AZ")
.AutoFilter Field:=52, Criteria1:="="
Rows("2:" & Rw).ClearContents
.AutoFilter
End With
'Restore order
Columns("A:AZ").Sort Key1:=Range("AX2"), Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom

[AX1].Select
Application.ScreenUpdating = True
End Sub

james123
01-11-2007, 07:04 PM
Thanks guys, Awsome :D

mdmackillop
01-11-2007, 07:06 PM
How long did it run?

james123
01-11-2007, 09:29 PM
took 60-70seconds! very good considering the age of the laptop work have given me. :D

Ken Puls
01-11-2007, 09:33 PM
For 30,000 rows? Wow, that's pretty good, actually. Nice work, Malcolm!