Consulting

Results 1 to 5 of 5

Thread: De-Duplication & Merge with multiple criteria

  1. #1
    VBAX Regular
    Joined
    Mar 2014
    Posts
    25
    Location

    De-Duplication & Merge with multiple criteria

    Hi,

    I am trying to modify an existing code that I found on Mr Excel, there are multiple instances of the same code when googled. I am not sure who the original author is so I can respect and thank him/her for the code. The code works great for what it was intended for however, I can't make heads or tail of the code for me to modify it. Hence I would appreciate any help on this. There have been earlier posts related to de-duplication however, none of them that I found compare multiple columns/attributes.

    Excel Version : 2013
    Problem : The current VBA code searches for any duplicates in ColA only and merges those duplicate entries into a single entry with its attributes in their respective columns. How ever, if I try to modify the code so that it searches for exact matches in Col A and Col B then it should de-duplicate and merge those entries. Need help with this part.

    I am attaching 2 sheets : Vba2 & Vba3. The vba 2 file-sheet 1 has raw data and shows exactly what the current code does and it searches for dup's in Col A and merges attributes and the intended result is in sheet 2 of the file.
    Now if I modify the file by adding another column(ColB) for it to verify for duplicate matches, it does not take the ColB into account, instead just uses ColA entries. I added a sample data into the sheet1 of Vba3 file and the intended solution is in Sheet 3 of file Vba2.

    VB:
    Sub combine()
    Dim x, y(), s$, i&, j&, k&, n&
    x = Sheets("Sheet1").Range("A1").CurrentRegion.Value
    ReDim y(1 To UBound(x, 1), 1 To UBound(x, 2))
    With CreateObject("Scripting.Dictionary")
        .CompareMode = 1
        For i = 1 To UBound(x)
            If .exists(x(i, 1)) Then
                k = .Item((x(i, 1)))
                For n = 1 To UBound(x, 2)
                    If IsEmpty(y(k, n)) Then
                    y(.Item((x(i, 1))), n) = x(i, n)
                    End If
                Next n
            Else
                j = j + 1
                .Item((x(i, 1))) = j
                For k = 1 To UBound(x, 2)
                y(j, k) = x(i, k)
                Next k
            End If
        Next i
    End With
    With Sheets("Current Solution")
        .UsedRange.ClearContents
        .Range("A1").Resize(j, UBound(x, 2)).Value = y()
         With .Range("A2").CurrentRegion
            .Sort Key1:=.Cells(1, 1), Order1:=xlAscending, Header:=xlYes
        End With
    End With
    End Sub

    Can someone help me modify this please. Also I would appreciate if one can add comments to the code as I would like to understand what it's doing . I do not understand the Scripting.Dictionary and CompareMode functions. If it's not too much trouble to ask, how would we modify this even further if we were to compare ColA, ColB, and ColC...more than 3 attributes to find the exact match. ?

    Thanks & Regards,
    Vijyat
    Attached Files Attached Files
    Last edited by Bob Phillips; 04-04-2014 at 03:42 AM. Reason: Added VBA tags

  2. #2
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,453
    Location
    Option Explicit
    Option Compare Text ' makes user input case-insensitive, so it doesn't matter if user enters "P" or "p"
    
    'De-dup and merge
    
    Sub combine()
    Dim lastrow As Long
    Dim rng As Range
    
        Worksheets("Current Solution").Range("A1").CurrentRegion.ClearContents
        Worksheets("Sheet1").Range("A1").CurrentRegion.Copy Worksheets("Current Solution").Range("A1")
        
        With Worksheets("Current Solution")
        
            lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
            
            .Rows(1).Insert
            .Columns(1).Insert
            .Range("A1").Value = "tmp"
            .Range("A2").Value = "FALSE"
            .Range("A3").Resize(lastrow - 1).Formula = "=SUMPRODUCT(--(B$3:B3=B3),--(C$3:C3=C3))>1"
            Set rng = .Range("A1").Resize(lastrow + 1)
            rng.AutoFilter Field:=1, Criteria1:="=TRUE"
            On Error Resume Next
            Set rng = rng.SpecialCells(xlCellTypeVisible)
            On Error GoTo 0
            If Not rng Is Nothing Then rng.EntireRow.Delete
            
            .Columns(1).Delete
            
             With .Range("A1").CurrentRegion
             
                .Sort Key1:=.Range("A1"), Order1:=xlAscending, Key2:=.Range("B1"), Order2:=xlAscending, Header:=xlYes
            End With
        End With
    End Sub
    ____________________________________________
    Nihil simul inventum est et perfectum

    Abusus non tollit usum

    Last night I dreamed of a small consolation enjoyed only by the blind: Nobody knows the trouble I've not seen!
    James Thurber

  3. #3
    VBAX Regular
    Joined
    Mar 2014
    Posts
    25
    Location
    Hi xld,

    Thanks for your solution. I tried the above, however it tends to miss out on some data points. In all from column C to L (Table 1-10) there are 15 data points, but when I use your code it shows me only 9 data points.(Missing cell values in reference to intended solution - Q8,R7,T4,U2,U9,V8) Not sure if its ignoring those values alltogether or deleting them.


    From your code above VB:
    With Worksheets("Current Solution") 
             
            lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row 
             
            .Rows(1).Insert 
            .Columns(1).Insert 
            .Range("A1").Value = "tmp" 
            .Range("A2").Value = "FALSE" 
            .Range("A3").Resize(lastrow - 1).Formula = "=SUMPRODUCT(--(B$3:B3=B3),--(C$3:C3=C3))>1" 
            Set rng = .Range("A1").Resize(lastrow + 1) 
            rng.AutoFilter Field:=1, Criteria1:="=TRUE" 
            On Error Resume Next 
            Set rng = rng.SpecialCells(xlCellTypeVisible) 
            On Error Goto 0
    ]Sorry I am not great at programming, I may be wrong aswell, but do the above lines mean that the data sorting takes place in the "Current Solution" sheet and not sheet1 ? As per the original code data sorting takes place in sheet 1 and then the range gets paste over, wouldn't that make more sense logically ? If you can kindly add comments that'll be great.

    Thanx,
    Vijyat[
    Last edited by Bob Phillips; 04-04-2014 at 06:53 AM. Reason: Added VBA tags

  4. #4
    Distinguished Lord of VBAX VBAX Grand Master Bob Phillips's Avatar
    Joined
    Apr 2005
    Posts
    25,453
    Location
    Option Explicit
    Option Compare Text ' makes user input case-insensitive, so it doesn't matter if user enters "P" or "p"
     
     'De-dup and merge
     
    Sub combine()
        Dim lastrow As Long, lastcol As Long
        Dim rng As Range
        Dim i As Long, ii As Long
         
        Worksheets("Current Solution").Range("A1").CurrentRegion.ClearContents
        Worksheets("Sheet1").Range("A1").CurrentRegion.Copy Worksheets("Current Solution").Range("A1")
         
        With Worksheets("Current Solution")
        
            lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row
            lastcol = .Cells(1, .Columns.Count).End(xlToLeft).Column
        
            For i = 2 To lastrow
            
                For ii = 3 To lastcol
             
                    If .Cells(i, ii).Value = "" Then
                    
                        .Cells(i, ii).FormulaR1C1 = "=SUMIFS(R" & i + 1 & "C:R" & lastrow & "C,R" & i + 1 & "C1:R" & lastrow & "C1,RC1,R" & i + 1 & "C2:R" & lastrow & "C2,RC2)"
                        .Cells(i, ii).Value = .Cells(i, ii).Value
                    End If
                Next ii
            Next i
             
            .Rows(1).Insert
            .Columns(1).Insert
            .Range("A1").Value = "tmp"
            .Range("A2").Value = "FALSE"
            .Range("A3").Resize(lastrow - 1).Formula = "=SUMPRODUCT(--(B$3:B3=B3),--(C$3:C3=C3))>1"
            Set rng = .Range("A1").Resize(lastrow + 1)
            rng.AutoFilter Field:=1, Criteria1:="=TRUE"
            On Error Resume Next
            Set rng = rng.SpecialCells(xlCellTypeVisible)
            On Error GoTo 0
            If Not rng Is Nothing Then rng.EntireRow.Delete
             
            .Columns(1).Delete
             
            With .Range("A1").CurrentRegion
                 
                .Sort Key1:=.Range("A1"), Order1:=xlAscending, Key2:=.Range("B1"), Order2:=xlAscending, Header:=xlYes
            
                .NumberFormat = "General;General;"
            End With
        End With
    End Sub
    ____________________________________________
    Nihil simul inventum est et perfectum

    Abusus non tollit usum

    Last night I dreamed of a small consolation enjoyed only by the blind: Nobody knows the trouble I've not seen!
    James Thurber

  5. #5
    VBAX Regular
    Joined
    Mar 2014
    Posts
    25
    Location
    Hi Xld,

    Thanks again. The code works perfectly . I am trying to modify it so it can consider multiple attributes at the same time based on user request. I shall mark this thread solved, if I have any further doubts/issues I shall get back to you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •