View Full Version : [SOLVED:] Detecting Duplicate entries
Stefan15
09-12-2017, 11:56 AM
Dear Forum,
I have a file with over 100,000 rows of excel data, what am trying to achieve is to detect duplicate entries from this file. The only way to detect true duplicate is to match duplicates in 2 separate columns. That is, if there is a duplicate vendor name and duplicate Reference number, then there is a duplicate.
If there is a match, the status column should state “duplicate” otherwise “Single”
See example below, I am open to ideas on how to go about this and possibly a VBA code that would help.
SN
Date
Year
Group
Vendor name
Doc Date
Amount
Reference No
Posting Date
Posted by
Status
John’s Ltd
00582671
Duplicate
John’s Ltd
00582671
Duplicate
Best Western Hotel
00225874
Duplicate
Best Western Hotel
00225874
Duplicate
Thanks :yes
offthelip
09-12-2017, 12:21 PM
try this , i have assumed the two columns are in A and B and then I output the results to Column C, if you want to use columns which are spaced apart just load two inarr arrays and concatenate them.
Sub dupcheck()lastrow = Cells(Rows.Count, "A").End(xlUp).Row
inarr = Range(Cells(1, 1), Cells(lastrow, 2))
Range(Cells(1, 3), Cells(lastrow, 3)) = "Single"
outarr = Range(Cells(1, 3), Cells(lastrow, 3))
Dim conar() As Variant
txt = "Duplicate"
ReDim conar(1 To lastrow)
For i = 1 To lastrow
' concatenate the two columns
conar(i) = inarr(i, 1) & inarr(i, 2)
Next i
' now search for duplicates
For i = 1 To lastrow - 1
If outarr(i, 1) <> txt Then
For j = i + 1 To lastrow
If conar(i) = conar(j) Then
' dup found
outarr(i, 1) = txt
outarr(j, 1) = txt
End If
Next j
End If
Next i
Range(Cells(1, 3), Cells(lastrow, 3)) = outarr
End Sub
mdmackillop
09-12-2017, 01:43 PM
Another to try
Sub Test()
Dim arr, reslt
Dim Combo As String
Dim dic As Object
Dim Col1, Col2, Col3
Col1 = 5 'First data
Col2 = 8 'Second Data
Col3 = 10 'Result
Set dic = CreateObject("Scripting.dictionary")
arr = Cells(1, 1).CurrentRegion
x = UBound(arr)
reslt = Cells(1, Col3).Resize(x)
For i = 2 To x
Combo = arr(i, Col1) & "_" & arr(i, Col2)
If Not dic.Exists(Combo) Then
dic.Add Combo, i
reslt(i, 1) = "Single"
Else
reslt(dic(Combo), 1) = "Duplicate"
reslt(i, 1) = "Duplicate"
End If
Next
Cells(1, Col3).Resize(x) = reslt
End Sub
mdmackillop
09-13-2017, 02:28 AM
A more flexible approach
Option Explicit
Sub Test()
Dim Arr, Reslt
Dim Combo As String
Dim Dic As Object
Dim Cols, c
Dim i&, j&, x&
Dim R As Range
Dim RCol&
'Columns to compare
Cols = Array(3, 5, 7)
'Results column
RCol = 10
Set Dic = CreateObject("Scripting.dictionary")
Arr = Cells(1, 1).CurrentRegion
x = UBound(Arr)
Set R = Cells(1, RCol).Resize(x)
Reslt = R
For i = 2 To x
Combo = ""
For Each c In Cols
Combo = Combo & Arr(i, c) & "||"
Next c
If Not Dic.Exists(Combo) Then
Dic.Add Combo, i
Reslt(i, 1) = "Single"
Else
Reslt(Dic(Combo), 1) = "Duplicate"
Reslt(i, 1) = "Duplicate"
End If
Next
R.Value = Reslt
End Sub
Stefan15
09-13-2017, 10:57 AM
Thanks for the your feedback's.
The first 2 codes are bringing up 'Compile Error" while the 3rd code is considers only duplicate in Vendor name without matching with duplicate reference no. also the duplicate/single status has replaces the data on the amount column.
What i am trying to achieve is that, duplicate vendor name should match with duplicate reference number and where there is a match the last column which is the status column should state either duplicate or single
Thanks once again for your help
mdmackillop
09-13-2017, 11:39 AM
Post a sample workbook if you want code tested.
Paul_Hossler
09-13-2017, 06:31 PM
Something simple using a collection
You might have to adjust some cell references
20336
Option Explicit
Sub MarkDups()
Dim r As Long, N As Long
Dim K As String
Dim C As Collection
Set C = New Collection
With ActiveSheet
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
On Error Resume Next
C.Add 0, K
On Error GoTo 0
N = C(K) + 1
C.Remove (K)
C.Add N, K
Next r
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
Next r
End With
End Sub
Stefan15
09-14-2017, 08:15 AM
Something simple using a collection
You might have to adjust some cell references
20336
Option Explicit
Sub MarkDups()
Dim r As Long, N As Long
Dim K As String
Dim C As Collection
Set C = New Collection
With ActiveSheet
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
On Error Resume Next
C.Add 0, K
On Error GoTo 0
N = C(K) + 1
C.Remove (K)
C.Add N, K
Next r
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
Next r
End With
End Sub
Many thanks Paul, the code works like Magic. :friends::friends:
And thank you to everyone that gave this a shot, I really appreciate the efforts.
Paul, please am learning vba, if it is not too much to ask, i was hoping if you can provide me with step by step explanation of what each line of code does?
Thanks once again
Stefan15
09-19-2017, 05:04 AM
Dear Forum Members
Please am new to VBA and i will appreciate if someone could help explain each line of this code for better understanding thanks
Option Explicit
Sub MarkDups()
Dim r As Long, N As Long
Dim K As String
Dim C As Collection
Set C = New Collection
With ActiveSheet
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
On Error Resume Next
C.Add 0, K
On Error GoTo 0
N = C(K) + 1
C.Remove (K)
C.Add N, K
Next r
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
Next r
End With
End Sub
Paul_Hossler
09-19-2017, 05:35 AM
Option Explicit
Sub MarkDups()
Dim r As Long, N As Long
Dim K As String
'Dim's a Collection object - check online help
Dim C As Collection
'Instantiates (sort of like 'creates')
Set C = New Collection
'use With ... End With for an object saves typing, but also make the logic more visible
With ActiveSheet
'goes from row 2 to the number of rows in the block surrounding A1 = .Cells(1,1)
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
'formats a 'Key' equals to Cell in col 5 + a # + the cell in col 8 since it's the combination of 5+8 that defines a duplicate
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
'if there's an error ignore it - would occur if a Collection entry with the key K already exists
On Error Resume Next
'if K exists in collection, it can't be added
'if K does NOT exist, then add it with a data value = 0
C.Add 0, K
'turn off the Ignore Errors
On Error GoTo 0
'get the data value from the collection item K, put it in N and add 1
N = C(K) + 1
'we know that there's a K since we either added it above or we retrieved it and added 1
'remove the old K and add a new one with data value = old value + 1
C.Remove (K)
C.Add N, K
Next r
'again, go from row 2 to the end of data
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
'construct a temp string (just for ease) = col 5 plus # plus col8
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
'if the data value for K = 1, put Single in col 11, otherwise put Duplicate in col11
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
'get the next row
Next r
End With
End Sub
Stefan15
09-21-2017, 10:43 AM
Many Thanks Paul. :bow::bow:
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.