PDA

View Full Version : [SOLVED] De-Duplication & Merge with multiple criteria



vijyat
04-03-2014, 10:32 AM
Hi,

I am trying to modify an existing code that I found on Mr Excel, there are multiple instances of the same code when googled. I am not sure who the original author is so I can respect and thank him/her for the code. The code works great for what it was intended for however, I can't make heads or tail of the code for me to modify it. Hence I would appreciate any help on this. There have been earlier posts related to de-duplication however, none of them that I found compare multiple columns/attributes.

Excel Version : 2013
Problem : The current VBA code searches for any duplicates in ColA only and merges those duplicate entries into a single entry with its attributes in their respective columns. How ever, if I try to modify the code so that it searches for exact matches in Col A and Col B then it should de-duplicate and merge those entries. Need help with this part.

I am attaching 2 sheets : Vba2 & Vba3. The vba 2 file-sheet 1 has raw data and shows exactly what the current code does and it searches for dup's in Col A and merges attributes and the intended result is in sheet 2 of the file.
Now if I modify the file by adding another column(ColB) for it to verify for duplicate matches, it does not take the ColB into account, instead just uses ColA entries. I added a sample data into the sheet1 of Vba3 file and the intended solution is in Sheet 3 of file Vba2.

VB:

Sub combine()
Dim x, y(), s$, i&, j&, k&, n&
x = Sheets("Sheet1").Range("A1").CurrentRegion.Value
ReDim y(1 To UBound(x, 1), 1 To UBound(x, 2))
With CreateObject("Scripting.Dictionary")
.CompareMode = 1
For i = 1 To UBound(x)
If .exists(x(i, 1)) Then
k = .Item((x(i, 1)))
For n = 1 To UBound(x, 2)
If IsEmpty(y(k, n)) Then
y(.Item((x(i, 1))), n) = x(i, n)
End If
Next n
Else
j = j + 1
.Item((x(i, 1))) = j
For k = 1 To UBound(x, 2)
y(j, k) = x(i, k)
Next k
End If
Next i
End With
With Sheets("Current Solution")
.UsedRange.ClearContents
.Range("A1").Resize(j, UBound(x, 2)).Value = y()
With .Range("A2").CurrentRegion
.Sort Key1:=.Cells(1, 1), Order1:=xlAscending, Header:=xlYes
End With
End With
End Sub


Can someone help me modify this please. Also I would appreciate if one can add comments to the code as I would like to understand what it's doing :bug:. I do not understand the Scripting.Dictionary and CompareMode functions. If it's not too much trouble to ask, how would we modify this even further if we were to compare ColA, ColB, and ColC...more than 3 attributes to find the exact match. ?

Thanks & Regards,
Vijyat

Bob Phillips
04-04-2014, 04:04 AM
Option Explicit
Option Compare Text ' makes user input case-insensitive, so it doesn't matter if user enters "P" or "p"

'De-dup and merge

Sub combine()
Dim lastrow As Long
Dim rng As Range

Worksheets("Current Solution").Range("A1").CurrentRegion.ClearContents
Worksheets("Sheet1").Range("A1").CurrentRegion.Copy Worksheets("Current Solution").Range("A1")

With Worksheets("Current Solution")

lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row

.Rows(1).Insert
.Columns(1).Insert
.Range("A1").Value = "tmp"
.Range("A2").Value = "FALSE"
.Range("A3").Resize(lastrow - 1).Formula = "=SUMPRODUCT(--(B$3:B3=B3),--(C$3:C3=C3))>1"
Set rng = .Range("A1").Resize(lastrow + 1)
rng.AutoFilter Field:=1, Criteria1:="=TRUE"
On Error Resume Next
Set rng = rng.SpecialCells(xlCellTypeVisible)
On Error GoTo 0
If Not rng Is Nothing Then rng.EntireRow.Delete

.Columns(1).Delete

With .Range("A1").CurrentRegion

.Sort Key1:=.Range("A1"), Order1:=xlAscending, Key2:=.Range("B1"), Order2:=xlAscending, Header:=xlYes
End With
End With
End Sub

vijyat
04-04-2014, 06:18 AM
Hi xld,

Thanks for your solution. I tried the above, however it tends to miss out on some data points. In all from column C to L (Table 1-10) there are 15 data points, but when I use your code it shows me only 9 data points.(Missing cell values in reference to intended solution - Q8,R7,T4,U2,U9,V8) Not sure if its ignoring those values alltogether or deleting them.


From your code above VB:

With Worksheets("Current Solution")

lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row

.Rows(1).Insert
.Columns(1).Insert
.Range("A1").Value = "tmp"
.Range("A2").Value = "FALSE"
.Range("A3").Resize(lastrow - 1).Formula = "=SUMPRODUCT(--(B$3:B3=B3),--(C$3:C3=C3))>1"
Set rng = .Range("A1").Resize(lastrow + 1)
rng.AutoFilter Field:=1, Criteria1:="=TRUE"
On Error Resume Next
Set rng = rng.SpecialCells(xlCellTypeVisible)
On Error Goto 0

]Sorry I am not great at programming, I may be wrong aswell, but do the above lines mean that the data sorting takes place in the "Current Solution" sheet and not sheet1 ? As per the original code data sorting takes place in sheet 1 and then the range gets paste over, wouldn't that make more sense logically ? If you can kindly add comments that'll be great.

Thanx,
Vijyat[

Bob Phillips
04-04-2014, 07:10 AM
Option Explicit
Option Compare Text ' makes user input case-insensitive, so it doesn't matter if user enters "P" or "p"

'De-dup and merge

Sub combine()
Dim lastrow As Long, lastcol As Long
Dim rng As Range
Dim i As Long, ii As Long

Worksheets("Current Solution").Range("A1").CurrentRegion.ClearContents
Worksheets("Sheet1").Range("A1").CurrentRegion.Copy Worksheets("Current Solution").Range("A1")

With Worksheets("Current Solution")

lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
lastcol = .Cells(1, .Columns.Count).End(xlToLeft).Column

For i = 2 To lastrow

For ii = 3 To lastcol

If .Cells(i, ii).Value = "" Then

.Cells(i, ii).FormulaR1C1 = "=SUMIFS(R" & i + 1 & "C:R" & lastrow & "C,R" & i + 1 & "C1:R" & lastrow & "C1,RC1,R" & i + 1 & "C2:R" & lastrow & "C2,RC2)"
.Cells(i, ii).Value = .Cells(i, ii).Value
End If
Next ii
Next i

.Rows(1).Insert
.Columns(1).Insert
.Range("A1").Value = "tmp"
.Range("A2").Value = "FALSE"
.Range("A3").Resize(lastrow - 1).Formula = "=SUMPRODUCT(--(B$3:B3=B3),--(C$3:C3=C3))>1"
Set rng = .Range("A1").Resize(lastrow + 1)
rng.AutoFilter Field:=1, Criteria1:="=TRUE"
On Error Resume Next
Set rng = rng.SpecialCells(xlCellTypeVisible)
On Error GoTo 0
If Not rng Is Nothing Then rng.EntireRow.Delete

.Columns(1).Delete

With .Range("A1").CurrentRegion

.Sort Key1:=.Range("A1"), Order1:=xlAscending, Key2:=.Range("B1"), Order2:=xlAscending, Header:=xlYes

.NumberFormat = "General;General;"
End With
End With
End Sub

vijyat
04-06-2014, 06:43 AM
Hi Xld,

Thanks again. The code works perfectly :bow:. I am trying to modify it so it can consider multiple attributes at the same time based on user request. I shall mark this thread solved, if I have any further doubts/issues I shall get back to you.