View Full Version : Removing duplicates from multiple columns
grose456
01-14-2017, 07:23 AM
I have a whole bunch of columns, 100+, and each one has many rows of data in it. (Numbers or text or maybe a combination of the two.) I want to delete the duplicate data, but I DON'T want to delete the rows. If you delete an entire row, that would delete data from other columns.
I need a macro that would look at each column independently. I want it to looks at the data in the first cell and then compare it to all of the data in that column in the rows below it. If it finds a match, it deletes that match. It then continues with the next item in that same column. It does this for every item I have in that column.
Then, once it is done with the last row in that column, a bunch of empty cells are left . The macro would then continue and sort the data, putting the empty cells below. The end result is that all of the duplicates are removed (deleted)
The macro then continues until each of the 100+ columns has been checked.
then transposes the data on a separate sheet
A generic code will be better so i can use it for few or many columns (totally new at writing code).
Is this possible?
gmaxey
01-14-2017, 09:11 AM
I'm going to have to assume that the data in adjacent columns are completely unrelated to each other.
Sub NoDupColumnData()
Dim oSheet As Worksheet
Dim oRng As Range
Dim varVals As Variant
ThisWorkbook.Sheets(1).UsedRange.Copy 'Whatever sheet has your 400 columns
Set oSheet = ThisWorkbook.Sheets.Add
oSheet.Name = "Filtered"
oSheet.Paste
Application.ScreenUpdating = False
For Each oRng In oSheet.UsedRange.Columns
varVals = fcnUnique(oRng)
oRng.Clear
oRng.Resize(UBound(varVals), 1).Value = WorksheetFunction.Transpose(varVals)
Next oRng
Application.ScreenUpdating = True
lbl_Exit:
Exit Sub
End Sub
Public Function fcnUnique(oColRng As Range) As Variant
Dim colUniques As New Collection
Dim oCell As Range
Dim lngIndex As Long
Dim vUnique As Variant
On Error Resume Next
For Each oCell In oColRng.Cells
If Len(CStr(oCell)) > 0 Then
colUniques.Add oCell, CStr(oCell)
End If
If Err.Number <> 0 Then
'colUniques.Add "", CStr(lngIndex)
'lngIndex = lngIndex + 1
End If
Next oCell
On Error GoTo 0
ReDim vUnique(1 To colUniques.Count)
For lngIndex = LBound(vUnique) To UBound(vUnique)
vUnique(lngIndex) = colUniques(lngIndex)
Next lngIndex
fcnUnique = vUnique
lbl_Exit:
Exit Function
End Function
grose456
01-14-2017, 06:40 PM
SEE SHEET 2.
mike7952
01-14-2017, 07:16 PM
How about this
Sub abc()
Dim arr, iRow As Long, iCol As Long, i As Long
arr = Range("a1").CurrentRegion
Worksheets.Add: i = 1
With CreateObject("scripting.dictionary")
For iCol = 1 To UBound(arr, 2)
For iRow = 1 To UBound(arr)
If Not .exists(arr(iRow, iCol)) And Trim$(arr(iRow, iCol)) <> "" Then
.Item(arr(iRow, iCol)) = Empty
End If
Next
Cells(i, "a").Resize(, .Count) = .keys
.RemoveAll: i = i + 1
Next
End With
Cells(1).CurrentRegion.Borders.LineStyle = xlContinuous
End Sub
grose456
01-14-2017, 07:39 PM
SORRY, EDITED MY PREVIOUS REPLY INSTEAD OF POSTING A NEW REPLY. SEE ABOVE ATTACHMENT. Visited and read your website. You have a very impressive background.
mike7952
01-15-2017, 04:30 AM
Does my solution in Post#4 not do what you want?
mikerickson
01-15-2017, 10:39 AM
Perhaps this will do what you want
Sub test()
Dim arrInput As Variant
Dim colNum As Long, rowNum As Long
Dim destinationRange As Range
Set destinationRange = Sheet3.Range("A1")
arrInput = Sheet1.Range("A1").CurrentRegion
colNum = 1
For colNum = 1 To UBound(arrInput, 2)
For rowNum = 1 To UBound(arrInput, 1)
If arrInput(rowNum, colNum) <> vbNullString Then
If rowNum <> Application.Match(arrInput(rowNum, colNum), Application.Index(arrInput, 0, colNum), 0) Then
arrInput(rowNum, colNum) = vbNullString
End If
End If
Next rowNum
Next colNum
Application.ScreenUpdating = False
With destinationRange
With .Resize(UBound(arrInput, 2), UBound(arrInput, 1))
.Value = Application.Transpose(arrInput)
.SpecialCells(xlCellTypeBlanks).Delete shift:=xlToLeft
End With
End With
Application.ScreenUpdating = True
End Sub
Paul_Hossler
01-15-2017, 10:58 AM
Maybe something like this
Some cells had a leading space so I had to TRIM them also to make certain that ZH9 and spaceZH9 would be de-dupped
Option Explicit
Sub ColumnDeDup()
Dim rCol As Range
Dim wsInput As Worksheet, wsOutput As Worksheet
Application.ScreenUpdating = False
Set wsInput = Worksheets("Input")
With wsInput.Cells(1, 1).CurrentRegion
'trim
.Value = Application.Evaluate("IF(" & .Address & "<>"""",TRIM(" & .Address & "),"""")")
'remove empty cells
On Error Resume Next
.Cells.SpecialCells(xlCellTypeBlanks).Delete Shift:=xlUp
On Error GoTo 0
'remove dups in each col
For Each rCol In .Columns
Call rCol.RemoveDuplicates(1, xlYes)
Next
End With
'add new worksheet named 'Output'
Application.DisplayAlerts = False
On Error Resume Next
Worksheets("Output").Delete
On Error GoTo 0
Application.DisplayAlerts = True
Worksheets.Add.Name = "Output"
Set wsOutput = Worksheets("Output")
'copy and transpose to Output
wsInput.Cells(1, 1).CurrentRegion.Copy
wsOutput.Select
wsOutput.Cells(1, 1).Select
Selection.PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, Transpose:=True
'make Output pretty
wsOutput.Cells(1, 1).CurrentRegion.EntireColumn.AutoFit
Application.ScreenUpdating = True
End Sub
All you need:
Sub M_snb()
Application.ScreenUpdating = False
Sheet3.UsedRange.ClearContents
For j = 1 To Sheet1.Cells(1).CurrentRegion.Columns.Count
Sheet1.Cells(1).CurrentRegion.Columns(j).AdvancedFilter 2, , Sheet3.Cells(1, j), True
Next
Sheet3.UsedRange.SpecialCells(4).Delete
sn = Application.Transpose(Sheet3.UsedRange)
Sheet3.UsedRange.ClearContents
Sheet3.Cells(1).Resize(UBound(sn), UBound(sn, 2)) = sn
End Sub
grose456
01-15-2017, 04:51 PM
gives me:run time error 424
grose456
01-15-2017, 04:53 PM
hi Paul.
i need the duplicate values to be gone, only the unique value to remain.
grose456
01-15-2017, 04:55 PM
Sorry Mike7952,
new at this so thought i was still replying to gmaxey. i want the duplicate values to go and only teh unique value to remain. see my attachment sheet one above.
In the attached workbook click the Run button on Sheet1 to create a list of required unique values in Sheet2
The code:
Sub ExtractUnique()
'ZVI:2017-01-16
'--> Settings, change to suit
Const Src = "Sheet1" ' Name of the source sheet
Const Dst = "Sheet2" ' Name of the destination sheet
' <-- End of settingw
Dim a
Dim Col As Range, Rng As Range
Dim i As Long, j As Long
Dim s As String
' Define source data range taking into the account possible empty rows among data rows
With Worksheets(Src)
Set Rng = Intersect(.UsedRange, .UsedRange.Cells(1).CurrentRegion.EntireColumn)
End With
' Clear the destination sheet
Sheets("Sheet2").UsedRange.ClearContents
' Main
With CreateObject("Scripting.Dictionary")
.CompareMode = 1
' We need to go column by column to prevent overflow of amount values in a()
For Each Col In Rng.Columns
a = Col.Cells.Value
For i = 1 To UBound(a)
s = Trim(a(i, 1))
If Len(s) Then .Item(s) = vbNullString
Next
If .Count Then
j = j + 1
a = .Keys
Sheets(Dst).Cells(j, 1).Resize(, UBound(a)).Value = a
End If
.RemoveAll
Next
End With
' Fit widths and activate the destination sheet
Sheets(Dst).UsedRange.Columns.AutoFit
Sheets(Dst).Select
End Sub
gives me:run time error 424
Do not use VBA-code you don't understand.
Of course you have to add a sheet3 first (see attachment)
or use
Sub M_snb()
sn = Sheet1.Cells(1).CurrentRegion
With CreateObject("scripting.dictionary")
For jj = 1 To UBound(sn, 2)
For j = 1 To UBound(sn)
If .exists(sn(j, jj)) Or sn(j, jj) = "" Then
sn(j, jj) = ""
Else
x0 = .Item(sn(j, jj))
sn(.Count, jj) = sn(j, jj)
End If
Next
.RemoveAll
Next
Sheet2.Cells(1).Resize(UBound(sn, 2), UBound(sn)) = Application.Transpose(sn)
End With
End Sub
Paul_Hossler
01-16-2017, 07:39 AM
hi Paul.
i need the duplicate values to be gone, only the unique value to remain.
I thought they were
Can you provide an example?
In post #8 with the macro and the attachment, 'Original' is your data, 'Input' is a copy I made of 'Original' for testing
When I ran the macro on 'Input' it appeared to remove the duplicates on a column-by-column basis
18045
'Output' is the transpose you asked for using 'input' as an input
mike7952
01-16-2017, 11:04 AM
Paul
I believe I have a solution in post#4 also based on a column to column basis.
Sub abc()
Dim arr, iRow As Long, iCol As Long, i As Long
arr = Range("a1").CurrentRegion
Worksheets.Add: i = 1
With CreateObject("scripting.dictionary")
For iCol = 1 To UBound(arr, 2)
For iRow = 1 To UBound(arr)
If Not .exists(arr(iRow, iCol)) And Trim$(arr(iRow, iCol)) <> "" Then
.Item(arr(iRow, iCol)) = Empty
End If
Next
Cells(i, UBound(arr, 2) + 3).Resize(, .Count) = .keys
Cells(1, iCol).Resize(.Count) = Application.Transpose(.keys)
.RemoveAll: i = i + 1
Next
End With
Cells(1).CurrentRegion.Borders.LineStyle = xlContinuous
Cells(1, UBound(arr, 2) + 5).CurrentRegion.Borders.LineStyle = xlContinuous
End Sub
Paul_Hossler
01-16-2017, 02:14 PM
Well, since Excel provides a built in .RemoveDuplicates method, I figure it's easier and faster to just use it instead of rolling my own
grose456
01-16-2017, 02:15 PM
i went back and checked the file. when i say i want to delete duplicates i want only the unique value to remailn. for example in the attachment "original" the pink colored cell are duplicates which i need to disppear and only the green value to remain.
grose456
01-16-2017, 02:17 PM
i would have done that except i need it for one column at a time. done independently of each other. i tried doing it one column at a time but after 30 columns i figured there had to be an easier way.
grose456
01-16-2017, 02:21 PM
see post #3
grose456
01-16-2017, 02:41 PM
In the attached workbook click the Run button on Sheet1 to create a list of required unique values in Sheet2- I TRIED IT BUT DUPLICATE VALUE REMAIN I NEED ONLY UNIQUE VALUES TO REMAIN
mike7952
01-16-2017, 04:57 PM
These values are in columns Q:T.... There are 3 M's 3 10's 2 12's.... So from your example its not making any sense to me? What exactly are every solution that has been given doing wrong?
NIEVES, JORGE
CRUZ, ALBERTO
SUKHDEO, CHRISTOPHER
MUNIR, MASHRATOON
4I7
4W7
2X8
2W9
M
M
M
F
12
12
JIMENEZ NATIVIDAD
LEONARDO MARIA
VIEITEZ ELENA
TERRERO HILDA
HGS44QCX
HGS44QDX
EES86
8
10
ZH9
10
SES22QLR
SES22QLS
10
ZM2017
ZM2017
ZM2018
SES22QLX
ZMFOR
ZMINT
ZMJIME
ZH9
ZMTWIL
ZMVIEI
ZMLAW
ZM2019
ZMINT
ZMLEON
click the Run button on Sheet1 to create a list of required unique values in Sheet2
I TRIED IT BUT DUPLICATE VALUE REMAIN I NEED ONLY UNIQUE VALUES TO REMAIN
Could you please post some addresses of cells in Sheet2 with duplicates among any row after Run button clicking?
I have found nothing of them...
Paul_Hossler
01-16-2017, 07:06 PM
@grose456 ---
Is it correct to say that you only want the UNIQUE values in each column to be left?
Usually in Excel I think of 'Remove Duplicates' to mean to remove the extra entries, leaving the first occurrence
Question: So if you start with ...
Col A
A
B
C
D
Z
A
B
C
you want to end with ...
Col A
D
Z
since A, B, and C are NOT unique?
then do the same for column B
Powered by vBulletin® Version 4.2.5 Copyright © 2024 vBulletin Solutions Inc. All rights reserved.