PDA

View Full Version : Removing duplicates from multiple columns



grose456
01-14-2017, 07:23 AM
I have a whole bunch of columns, 100+, and each one has many rows of data in it. (Numbers or text or maybe a combination of the two.) I want to delete the duplicate data, but I DON'T want to delete the rows. If you delete an entire row, that would delete data from other columns.
I need a macro that would look at each column independently. I want it to looks at the data in the first cell and then compare it to all of the data in that column in the rows below it. If it finds a match, it deletes that match. It then continues with the next item in that same column. It does this for every item I have in that column.
Then, once it is done with the last row in that column, a bunch of empty cells are left . The macro would then continue and sort the data, putting the empty cells below. The end result is that all of the duplicates are removed (deleted)
The macro then continues until each of the 100+ columns has been checked.
then transposes the data on a separate sheet
A generic code will be better so i can use it for few or many columns (totally new at writing code).
Is this possible?

gmaxey
01-14-2017, 09:11 AM
I'm going to have to assume that the data in adjacent columns are completely unrelated to each other.


Sub NoDupColumnData()
Dim oSheet As Worksheet
Dim oRng As Range
Dim varVals As Variant
ThisWorkbook.Sheets(1).UsedRange.Copy 'Whatever sheet has your 400 columns
Set oSheet = ThisWorkbook.Sheets.Add
oSheet.Name = "Filtered"
oSheet.Paste
Application.ScreenUpdating = False
For Each oRng In oSheet.UsedRange.Columns
varVals = fcnUnique(oRng)
oRng.Clear
oRng.Resize(UBound(varVals), 1).Value = WorksheetFunction.Transpose(varVals)
Next oRng
Application.ScreenUpdating = True
lbl_Exit:
Exit Sub
End Sub
Public Function fcnUnique(oColRng As Range) As Variant
Dim colUniques As New Collection
Dim oCell As Range
Dim lngIndex As Long
Dim vUnique As Variant
On Error Resume Next
For Each oCell In oColRng.Cells
If Len(CStr(oCell)) > 0 Then
colUniques.Add oCell, CStr(oCell)
End If
If Err.Number <> 0 Then
'colUniques.Add "", CStr(lngIndex)
'lngIndex = lngIndex + 1
End If
Next oCell
On Error GoTo 0
ReDim vUnique(1 To colUniques.Count)
For lngIndex = LBound(vUnique) To UBound(vUnique)
vUnique(lngIndex) = colUniques(lngIndex)
Next lngIndex
fcnUnique = vUnique
lbl_Exit:
Exit Function
End Function

grose456
01-14-2017, 06:40 PM
SEE SHEET 2.

mike7952
01-14-2017, 07:16 PM
How about this


Sub abc()
Dim arr, iRow As Long, iCol As Long, i As Long

arr = Range("a1").CurrentRegion

Worksheets.Add: i = 1
With CreateObject("scripting.dictionary")
For iCol = 1 To UBound(arr, 2)
For iRow = 1 To UBound(arr)
If Not .exists(arr(iRow, iCol)) And Trim$(arr(iRow, iCol)) <> "" Then
.Item(arr(iRow, iCol)) = Empty
End If
Next
Cells(i, "a").Resize(, .Count) = .keys
.RemoveAll: i = i + 1
Next
End With
Cells(1).CurrentRegion.Borders.LineStyle = xlContinuous
End Sub

grose456
01-14-2017, 07:39 PM
SORRY, EDITED MY PREVIOUS REPLY INSTEAD OF POSTING A NEW REPLY. SEE ABOVE ATTACHMENT. Visited and read your website. You have a very impressive background.

mike7952
01-15-2017, 04:30 AM
Does my solution in Post#4 not do what you want?

mikerickson
01-15-2017, 10:39 AM
Perhaps this will do what you want

Sub test()
Dim arrInput As Variant
Dim colNum As Long, rowNum As Long
Dim destinationRange As Range

Set destinationRange = Sheet3.Range("A1")
arrInput = Sheet1.Range("A1").CurrentRegion

colNum = 1
For colNum = 1 To UBound(arrInput, 2)
For rowNum = 1 To UBound(arrInput, 1)
If arrInput(rowNum, colNum) <> vbNullString Then
If rowNum <> Application.Match(arrInput(rowNum, colNum), Application.Index(arrInput, 0, colNum), 0) Then
arrInput(rowNum, colNum) = vbNullString
End If
End If
Next rowNum
Next colNum

Application.ScreenUpdating = False
With destinationRange
With .Resize(UBound(arrInput, 2), UBound(arrInput, 1))
.Value = Application.Transpose(arrInput)
.SpecialCells(xlCellTypeBlanks).Delete shift:=xlToLeft
End With
End With
Application.ScreenUpdating = True
End Sub

Paul_Hossler
01-15-2017, 10:58 AM
Maybe something like this

Some cells had a leading space so I had to TRIM them also to make certain that ZH9 and spaceZH9 would be de-dupped






Option Explicit

Sub ColumnDeDup()
Dim rCol As Range
Dim wsInput As Worksheet, wsOutput As Worksheet

Application.ScreenUpdating = False

Set wsInput = Worksheets("Input")

With wsInput.Cells(1, 1).CurrentRegion

'trim
.Value = Application.Evaluate("IF(" & .Address & "<>"""",TRIM(" & .Address & "),"""")")

'remove empty cells
On Error Resume Next
.Cells.SpecialCells(xlCellTypeBlanks).Delete Shift:=xlUp
On Error GoTo 0

'remove dups in each col
For Each rCol In .Columns
Call rCol.RemoveDuplicates(1, xlYes)
Next
End With


'add new worksheet named 'Output'
Application.DisplayAlerts = False
On Error Resume Next
Worksheets("Output").Delete
On Error GoTo 0
Application.DisplayAlerts = True
Worksheets.Add.Name = "Output"

Set wsOutput = Worksheets("Output")

'copy and transpose to Output
wsInput.Cells(1, 1).CurrentRegion.Copy
wsOutput.Select
wsOutput.Cells(1, 1).Select
Selection.PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:=False, Transpose:=True


'make Output pretty
wsOutput.Cells(1, 1).CurrentRegion.EntireColumn.AutoFit


Application.ScreenUpdating = True
End Sub

snb
01-15-2017, 12:35 PM
All you need:


Sub M_snb()
Application.ScreenUpdating = False
Sheet3.UsedRange.ClearContents

For j = 1 To Sheet1.Cells(1).CurrentRegion.Columns.Count
Sheet1.Cells(1).CurrentRegion.Columns(j).AdvancedFilter 2, , Sheet3.Cells(1, j), True
Next
Sheet3.UsedRange.SpecialCells(4).Delete
sn = Application.Transpose(Sheet3.UsedRange)

Sheet3.UsedRange.ClearContents

Sheet3.Cells(1).Resize(UBound(sn), UBound(sn, 2)) = sn
End Sub

grose456
01-15-2017, 04:51 PM
gives me:run time error 424

grose456
01-15-2017, 04:53 PM
hi Paul.
i need the duplicate values to be gone, only the unique value to remain.

grose456
01-15-2017, 04:55 PM
Sorry Mike7952,
new at this so thought i was still replying to gmaxey. i want the duplicate values to go and only teh unique value to remain. see my attachment sheet one above.

ZVI
01-16-2017, 12:37 AM
In the attached workbook click the Run button on Sheet1 to create a list of required unique values in Sheet2
The code:


Sub ExtractUnique()
'ZVI:2017-01-16

'--> Settings, change to suit
Const Src = "Sheet1" ' Name of the source sheet
Const Dst = "Sheet2" ' Name of the destination sheet
' <-- End of settingw

Dim a
Dim Col As Range, Rng As Range
Dim i As Long, j As Long
Dim s As String

' Define source data range taking into the account possible empty rows among data rows
With Worksheets(Src)
Set Rng = Intersect(.UsedRange, .UsedRange.Cells(1).CurrentRegion.EntireColumn)
End With

' Clear the destination sheet
Sheets("Sheet2").UsedRange.ClearContents

' Main
With CreateObject("Scripting.Dictionary")
.CompareMode = 1
' We need to go column by column to prevent overflow of amount values in a()
For Each Col In Rng.Columns
a = Col.Cells.Value
For i = 1 To UBound(a)
s = Trim(a(i, 1))
If Len(s) Then .Item(s) = vbNullString
Next
If .Count Then
j = j + 1
a = .Keys
Sheets(Dst).Cells(j, 1).Resize(, UBound(a)).Value = a
End If
.RemoveAll
Next
End With

' Fit widths and activate the destination sheet
Sheets(Dst).UsedRange.Columns.AutoFit
Sheets(Dst).Select

End Sub

snb
01-16-2017, 01:19 AM
gives me:run time error 424
Do not use VBA-code you don't understand.

Of course you have to add a sheet3 first (see attachment)

or use

Sub M_snb()
sn = Sheet1.Cells(1).CurrentRegion

With CreateObject("scripting.dictionary")
For jj = 1 To UBound(sn, 2)
For j = 1 To UBound(sn)
If .exists(sn(j, jj)) Or sn(j, jj) = "" Then
sn(j, jj) = ""
Else
x0 = .Item(sn(j, jj))
sn(.Count, jj) = sn(j, jj)
End If
Next
.RemoveAll
Next
Sheet2.Cells(1).Resize(UBound(sn, 2), UBound(sn)) = Application.Transpose(sn)
End With
End Sub

Paul_Hossler
01-16-2017, 07:39 AM
hi Paul.
i need the duplicate values to be gone, only the unique value to remain.

I thought they were

Can you provide an example?

In post #8 with the macro and the attachment, 'Original' is your data, 'Input' is a copy I made of 'Original' for testing

When I ran the macro on 'Input' it appeared to remove the duplicates on a column-by-column basis

18045

'Output' is the transpose you asked for using 'input' as an input

mike7952
01-16-2017, 11:04 AM
Paul
I believe I have a solution in post#4 also based on a column to column basis.


Sub abc()
Dim arr, iRow As Long, iCol As Long, i As Long

arr = Range("a1").CurrentRegion

Worksheets.Add: i = 1
With CreateObject("scripting.dictionary")
For iCol = 1 To UBound(arr, 2)
For iRow = 1 To UBound(arr)
If Not .exists(arr(iRow, iCol)) And Trim$(arr(iRow, iCol)) <> "" Then
.Item(arr(iRow, iCol)) = Empty
End If
Next
Cells(i, UBound(arr, 2) + 3).Resize(, .Count) = .keys
Cells(1, iCol).Resize(.Count) = Application.Transpose(.keys)
.RemoveAll: i = i + 1
Next
End With
Cells(1).CurrentRegion.Borders.LineStyle = xlContinuous
Cells(1, UBound(arr, 2) + 5).CurrentRegion.Borders.LineStyle = xlContinuous
End Sub

Paul_Hossler
01-16-2017, 02:14 PM
Well, since Excel provides a built in .RemoveDuplicates method, I figure it's easier and faster to just use it instead of rolling my own

grose456
01-16-2017, 02:15 PM
i went back and checked the file. when i say i want to delete duplicates i want only the unique value to remailn. for example in the attachment "original" the pink colored cell are duplicates which i need to disppear and only the green value to remain.

grose456
01-16-2017, 02:17 PM
i would have done that except i need it for one column at a time. done independently of each other. i tried doing it one column at a time but after 30 columns i figured there had to be an easier way.

grose456
01-16-2017, 02:21 PM
see post #3

grose456
01-16-2017, 02:41 PM
In the attached workbook click the Run button on Sheet1 to create a list of required unique values in Sheet2- I TRIED IT BUT DUPLICATE VALUE REMAIN I NEED ONLY UNIQUE VALUES TO REMAIN

mike7952
01-16-2017, 04:57 PM
These values are in columns Q:T.... There are 3 M's 3 10's 2 12's.... So from your example its not making any sense to me? What exactly are every solution that has been given doing wrong?



NIEVES, JORGE
CRUZ, ALBERTO
SUKHDEO, CHRISTOPHER
MUNIR, MASHRATOON


4I7
4W7
2X8
2W9


M
M
M
F


12
12
JIMENEZ NATIVIDAD
LEONARDO MARIA


VIEITEZ ELENA
TERRERO HILDA
HGS44QCX
HGS44QDX


EES86
8
10
ZH9


10
SES22QLR
SES22QLS
10


ZM2017
ZM2017
ZM2018
SES22QLX


ZMFOR
ZMINT
ZMJIME
ZH9


ZMTWIL
ZMVIEI
ZMLAW
ZM2019





ZMINT





ZMLEON

ZVI
01-16-2017, 05:42 PM
click the Run button on Sheet1 to create a list of required unique values in Sheet2
I TRIED IT BUT DUPLICATE VALUE REMAIN I NEED ONLY UNIQUE VALUES TO REMAIN
Could you please post some addresses of cells in Sheet2 with duplicates among any row after Run button clicking?
I have found nothing of them...

Paul_Hossler
01-16-2017, 07:06 PM
@grose456 ---

Is it correct to say that you only want the UNIQUE values in each column to be left?

Usually in Excel I think of 'Remove Duplicates' to mean to remove the extra entries, leaving the first occurrence


Question: So if you start with ...

Col A

A
B
C
D
Z
A
B
C


you want to end with ...

Col A

D
Z

since A, B, and C are NOT unique?


then do the same for column B