PDA

View Full Version : [SOLVED] Detecting Duplicate entries



Stefan15
09-12-2017, 11:56 AM
Dear Forum,

I have a file with over 100,000 rows of excel data, what am trying to achieve is to detect duplicate entries from this file. The only way to detect true duplicate is to match duplicates in 2 separate columns. That is, if there is a duplicate vendor name and duplicate Reference number, then there is a duplicate.

If there is a match, the status column should state “duplicate” otherwise “Single”

See example below, I am open to ideas on how to go about this and possibly a VBA code that would help.



SN
Date
Year
Group
Vendor name
Doc Date
Amount
Reference No
Posting Date
Posted by
Status






John’s Ltd


00582671


Duplicate






John’s Ltd


00582671


Duplicate






Best Western Hotel


00225874


Duplicate






Best Western Hotel


00225874


Duplicate



Thanks :yes

offthelip
09-12-2017, 12:21 PM
try this , i have assumed the two columns are in A and B and then I output the results to Column C, if you want to use columns which are spaced apart just load two inarr arrays and concatenate them.


Sub dupcheck()lastrow = Cells(Rows.Count, "A").End(xlUp).Row
inarr = Range(Cells(1, 1), Cells(lastrow, 2))
Range(Cells(1, 3), Cells(lastrow, 3)) = "Single"
outarr = Range(Cells(1, 3), Cells(lastrow, 3))
Dim conar() As Variant
txt = "Duplicate"
ReDim conar(1 To lastrow)


For i = 1 To lastrow
' concatenate the two columns
conar(i) = inarr(i, 1) & inarr(i, 2)
Next i
' now search for duplicates
For i = 1 To lastrow - 1
If outarr(i, 1) <> txt Then
For j = i + 1 To lastrow
If conar(i) = conar(j) Then
' dup found
outarr(i, 1) = txt
outarr(j, 1) = txt
End If
Next j
End If
Next i

Range(Cells(1, 3), Cells(lastrow, 3)) = outarr


End Sub

mdmackillop
09-12-2017, 01:43 PM
Another to try

Sub Test()


Dim arr, reslt
Dim Combo As String
Dim dic As Object
Dim Col1, Col2, Col3

Col1 = 5 'First data
Col2 = 8 'Second Data
Col3 = 10 'Result

Set dic = CreateObject("Scripting.dictionary")
arr = Cells(1, 1).CurrentRegion
x = UBound(arr)
reslt = Cells(1, Col3).Resize(x)
For i = 2 To x
Combo = arr(i, Col1) & "_" & arr(i, Col2)
If Not dic.Exists(Combo) Then
dic.Add Combo, i
reslt(i, 1) = "Single"
Else
reslt(dic(Combo), 1) = "Duplicate"
reslt(i, 1) = "Duplicate"
End If
Next
Cells(1, Col3).Resize(x) = reslt
End Sub

mdmackillop
09-13-2017, 02:28 AM
A more flexible approach


Option Explicit
Sub Test()
Dim Arr, Reslt
Dim Combo As String
Dim Dic As Object
Dim Cols, c
Dim i&, j&, x&
Dim R As Range
Dim RCol&


'Columns to compare
Cols = Array(3, 5, 7)
'Results column
RCol = 10

Set Dic = CreateObject("Scripting.dictionary")
Arr = Cells(1, 1).CurrentRegion
x = UBound(Arr)
Set R = Cells(1, RCol).Resize(x)
Reslt = R
For i = 2 To x
Combo = ""
For Each c In Cols
Combo = Combo & Arr(i, c) & "||"
Next c
If Not Dic.Exists(Combo) Then
Dic.Add Combo, i
Reslt(i, 1) = "Single"
Else
Reslt(Dic(Combo), 1) = "Duplicate"
Reslt(i, 1) = "Duplicate"
End If
Next
R.Value = Reslt
End Sub

Stefan15
09-13-2017, 10:57 AM
Thanks for the your feedback's.

The first 2 codes are bringing up 'Compile Error" while the 3rd code is considers only duplicate in Vendor name without matching with duplicate reference no. also the duplicate/single status has replaces the data on the amount column.

What i am trying to achieve is that, duplicate vendor name should match with duplicate reference number and where there is a match the last column which is the status column should state either duplicate or single

Thanks once again for your help

mdmackillop
09-13-2017, 11:39 AM
Post a sample workbook if you want code tested.

Paul_Hossler
09-13-2017, 06:31 PM
Something simple using a collection

You might have to adjust some cell references

20336




Option Explicit

Sub MarkDups()
Dim r As Long, N As Long
Dim K As String

Dim C As Collection

Set C = New Collection

With ActiveSheet
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value

On Error Resume Next
C.Add 0, K
On Error GoTo 0

N = C(K) + 1

C.Remove (K)
C.Add N, K
Next r

For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
Next r
End With
End Sub

Stefan15
09-14-2017, 08:15 AM
Something simple using a collection

You might have to adjust some cell references

20336




Option Explicit

Sub MarkDups()
Dim r As Long, N As Long
Dim K As String

Dim C As Collection

Set C = New Collection

With ActiveSheet
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value

On Error Resume Next
C.Add 0, K
On Error GoTo 0

N = C(K) + 1

C.Remove (K)
C.Add N, K
Next r

For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
Next r
End With
End Sub







Many thanks Paul, the code works like Magic. :friends::friends:

And thank you to everyone that gave this a shot, I really appreciate the efforts.

Paul, please am learning vba, if it is not too much to ask, i was hoping if you can provide me with step by step explanation of what each line of code does?

Thanks once again

Stefan15
09-19-2017, 05:04 AM
Dear Forum Members

Please am new to VBA and i will appreciate if someone could help explain each line of this code for better understanding thanks

Option Explicit

Sub MarkDups()
Dim r As Long, N As Long
Dim K As String

Dim C As Collection

Set C = New Collection

With ActiveSheet
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value

On Error Resume Next
C.Add 0, K
On Error GoTo 0

N = C(K) + 1

C.Remove (K)
C.Add N, K
Next r

For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
Next r
End With
End Sub

Paul_Hossler
09-19-2017, 05:35 AM
Option Explicit

Sub MarkDups()
Dim r As Long, N As Long
Dim K As String

'Dim's a Collection object - check online help
Dim C As Collection

'Instantiates (sort of like 'creates')
Set C = New Collection

'use With ... End With for an object saves typing, but also make the logic more visible
With ActiveSheet

'goes from row 2 to the number of rows in the block surrounding A1 = .Cells(1,1)
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count

'formats a 'Key' equals to Cell in col 5 + a # + the cell in col 8 since it's the combination of 5+8 that defines a duplicate
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value


'if there's an error ignore it - would occur if a Collection entry with the key K already exists
On Error Resume Next

'if K exists in collection, it can't be added
'if K does NOT exist, then add it with a data value = 0
C.Add 0, K
'turn off the Ignore Errors
On Error GoTo 0

'get the data value from the collection item K, put it in N and add 1
N = C(K) + 1

'we know that there's a K since we either added it above or we retrieved it and added 1
'remove the old K and add a new one with data value = old value + 1
C.Remove (K)
C.Add N, K
Next r

'again, go from row 2 to the end of data
For r = 2 To .Cells(1, 1).CurrentRegion.Rows.Count
'construct a temp string (just for ease) = col 5 plus # plus col8
K = .Cells(r, 5).Value & "#" & .Cells(r, 8).Value

'if the data value for K = 1, put Single in col 11, otherwise put Duplicate in col11
.Cells(r, 11).Value = IIf(C(K) = 1, "Single", "Duplicate")
'get the next row
Next r
End With
End Sub

Stefan15
09-21-2017, 10:43 AM
Many Thanks Paul. :bow::bow: