Consulting

Results 1 to 11 of 11

Thread: Detecting Duplicate entries

  1. #1
    VBAX Regular
    Joined
    Jul 2017
    Posts
    12
    Location

    Detecting Duplicate entries

    Dear Forum,

    I have a file with over 100,000 rows of excel data, what am trying to achieve is to detect duplicate entries from this file. The only way to detect true duplicate is to match duplicates in 2 separate columns. That is, if there is a duplicate vendor name and duplicate Reference number, then there is a duplicate.

    If there is a match, the status column should state “duplicate” otherwise “Single”

    See example below, I am open to ideas on how to go about this and possibly a VBA code that would help.

    SN Date Year Group Vendor name Doc Date Amount Reference No Posting Date Posted by Status
    John’s Ltd 00582671 Duplicate
    John’s Ltd 00582671 Duplicate
    Best Western Hotel 00225874 Duplicate
    Best Western Hotel 00225874 Duplicate

    Thanks

  2. #2
    VBAX Expert
    Joined
    May 2016
    Posts
    604
    Location
    try this , i have assumed the two columns are in A and B and then I output the results to Column C, if you want to use columns which are spaced apart just load two inarr arrays and concatenate them.

    Sub dupcheck()lastrow = Cells(Rows.Count, "A").End(xlUp).Row
    inarr = Range(Cells(1, 1), Cells(lastrow, 2))
    Range(Cells(1, 3), Cells(lastrow, 3)) = "Single"
    outarr = Range(Cells(1, 3), Cells(lastrow, 3))
    Dim conar() As Variant
    txt = "Duplicate"
    ReDim conar(1 To lastrow)
    
    
    For i = 1 To lastrow
    ' concatenate the two columns
     conar(i) = inarr(i, 1) & inarr(i, 2)
    Next i
     ' now search for duplicates
      For i = 1 To lastrow - 1
         If outarr(i, 1) <> txt Then
               For j = i + 1 To lastrow
                  If conar(i) = conar(j) Then
                  ' dup found
                  outarr(i, 1) = txt
                  outarr(j, 1) = txt
                  End If
               Next j
          End If
       Next i
                  
    Range(Cells(1, 3), Cells(lastrow, 3)) = outarr
    
    
    End Sub

  3. #3
    Administrator
    VP-Knowledge Base
    VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Another to try
    Sub Test()
    
    
    Dim arr, reslt
    Dim Combo As String
    Dim dic As Object
    Dim Col1, Col2, Col3
    
    Col1 = 5  'First data
    Col2 = 8   'Second Data
    Col3 = 10   'Result
    
    Set dic = CreateObject("Scripting.dictionary")
    arr = Cells(1, 1).CurrentRegion
    x = UBound(arr)
    reslt = Cells(1, Col3).Resize(x)
    For i = 2 To x
    Combo = arr(i, Col1) & "_" & arr(i, Col2)
    If Not dic.Exists(Combo) Then
        dic.Add Combo, i
        reslt(i, 1) = "Single"
    Else
        reslt(dic(Combo), 1) = "Duplicate"
        reslt(i, 1) = "Duplicate"
    End If
    Next
    Cells(1, Col3).Resize(x) = reslt
    End Sub
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  4. #4
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    A more flexible approach
    Option Explicit
    Sub Test()
        Dim Arr, Reslt
        Dim Combo As String
        Dim Dic As Object
        Dim Cols, c
        Dim i&, j&, x&
        Dim R As Range
        Dim RCol&
    
    
        'Columns to compare
        Cols = Array(3, 5, 7)
        'Results column
        RCol = 10
        
        Set Dic = CreateObject("Scripting.dictionary")
        Arr = Cells(1, 1).CurrentRegion
        x = UBound(Arr)
        Set R = Cells(1, RCol).Resize(x)
        Reslt = R
        For i = 2 To x
            Combo = ""
            For Each c In Cols
                Combo = Combo & Arr(i, c) & "||"
            Next c
            If Not Dic.Exists(Combo) Then
                Dic.Add Combo, i
                Reslt(i, 1) = "Single"
            Else
                Reslt(Dic(Combo), 1) = "Duplicate"
                Reslt(i, 1) = "Duplicate"
            End If
        Next
        R.Value = Reslt
    End Sub
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  5. #5
    VBAX Regular
    Joined
    Jul 2017
    Posts
    12
    Location
    Thanks for the your feedback's.

    The first 2 codes are bringing up 'Compile Error" while the 3rd code is considers only duplicate in Vendor name without matching with duplicate reference no. also the duplicate/single status has replaces the data on the amount column.

    What i am trying to achieve is that, duplicate vendor name should match with duplicate reference number and where there is a match the last column which is the status column should state either duplicate or single

    Thanks once again for your help

  6. #6
    Administrator
    VP-Knowledge Base VBAX Grand Master mdmackillop's Avatar
    Joined
    May 2004
    Location
    Scotland
    Posts
    14,489
    Location
    Post a sample workbook if you want code tested.
    MVP (Excel 2008-2010)

    Post a workbook with sample data and layout if you want a quicker solution.


    To help indent your macros try Smart Indent

    Please remember to mark threads 'Solved'

  7. #7
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,724
    Location
    Something simple using a collection

    You might have to adjust some cell references

    Capture.JPG


    Option Explicit
    
    Sub MarkDups()
        Dim r As Long, N As Long
        Dim K As String
        
        Dim C As Collection
        
        Set C = New Collection
        
        With ActiveSheet
            For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
                K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
                
                On Error Resume Next
                C.Add 0, K
                On Error GoTo 0
                
                N = C(K) + 1
                
                C.Remove (K)
                C.Add N, K
            Next r
        
            For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
                K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
                .Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
            Next r
        End With
    End Sub
    Attached Files Attached Files
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  8. #8
    VBAX Regular
    Joined
    Jul 2017
    Posts
    12
    Location
    Quote Originally Posted by Paul_Hossler View Post
    Something simple using a collection

    You might have to adjust some cell references

    Capture.JPG


    Option Explicit
    
    Sub MarkDups()
        Dim r As Long, N As Long
        Dim K As String
        
        Dim C As Collection
        
        Set C = New Collection
        
        With ActiveSheet
            For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
                K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
                
                On Error Resume Next
                C.Add 0, K
                On Error GoTo 0
                
                N = C(K) + 1
                
                C.Remove (K)
                C.Add N, K
            Next r
        
            For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
                K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
                .Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
            Next r
        End With
    End Sub




    Many thanks Paul, the code works like Magic.

    And thank you to everyone that gave this a shot, I really appreciate the efforts.

    Paul, please am learning vba, if it is not too much to ask, i was hoping if you can provide me with step by step explanation of what each line of code does?

    Thanks once again

  9. #9
    VBAX Regular
    Joined
    Jul 2017
    Posts
    12
    Location
    Dear Forum Members

    Please am new to VBA and i will appreciate if someone could help explain each line of this code for better understanding thanks

    Option Explicit

    Sub MarkDups()
    Dim r As Long, N As Long
    Dim K As String

    Dim C As Collection

    Set C = New Collection

    With ActiveSheet
    For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
    K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value

    On Error Resume Next
    C.Add 0, K
    On Error GoTo 0

    N = C(K) + 1

    C.Remove (K)
    C.Add N, K
    Next r

    For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
    K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
    .Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
    Next r
    End With
    End Sub

  10. #10
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,724
    Location
    Option Explicit 
     
    Sub MarkDups() 
        Dim r As Long, N As Long 
        Dim K As String 
    
    'Dim's a Collection object - check online help     
        Dim C As Collection 
    
    'Instantiates (sort of like 'creates')     
        Set C = New Collection 
    
    'use With ... End With for an object saves typing, but also make the logic more visible
        With ActiveSheet 
    
    'goes from row 2 to the number of rows in the block surrounding A1 = .Cells(1,1)
            For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count 
    
    'formats a 'Key' equals to Cell in col 5 + a # + the cell in col 8 since it's the combination of 5+8 that defines a duplicate
                K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value 
    
    
    'if there's an error ignore it - would occur if a Collection entry with the key K already exists
                On Error Resume Next 
    
    'if K exists in collection, it can't be added
    'if K does NOT exist, then add it with a data value = 0
                C.Add 0, K 
    'turn off the Ignore Errors
                On Error GoTo 0 
    
    'get the data value from the collection item K, put it in N and add 1             
                N = C(K) + 1 
    
    'we know that there's a K since we either added it above or we retrieved it and added 1
    'remove the old K and add a new one with data value = old value + 1
                C.Remove (K) 
                C.Add N, K 
            Next r 
    
    'again, go from row 2 to the end of data         
            For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count 
    'construct a temp string (just for ease)  = col 5 plus # plus col8
                K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value 
    
    'if the data value for K = 1, put Single in col 11, otherwise put Duplicate in col11
                .Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate") 
    'get the next row
            Next r 
        End With 
    End Sub
    Last edited by Paul_Hossler; 09-19-2017 at 06:24 AM.
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  11. #11
    VBAX Regular
    Joined
    Jul 2017
    Posts
    12
    Location
    Many Thanks Paul.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •