PDA

View Full Version : [SOLVED] Excel 2016 VBA remove duplicates by multiple AND/OR criteria



Edmon
07-12-2017, 04:35 AM
I’m just starting with VBA, but after spending a day looking at remove duplicates/values VBA tutorials I still cannot come up with a way to do this task.

I need to remove duplicates from a list based on several criteria.

First, identify duplicate record numbers (column A). Then delete the entire row of the duplicate(s) (there may be several instances of the same duplicate) except were the values at column S are different AND both (all) values belong to a predefined set, OR the values at column X are different (and the value of the duplicate is any value).

If the value of the duplicate is from set 1, and the value of its counterpart(s) is not from set 1, it is the latter that has to be deleted.

Not trying to delete both (all) duplicates! 1 value should remain.

Where the Location value is a town in England (London, Leeds), duplicates should be kept. Where the value in the Cost column is different, duplicates should be kept. If one of the duplicates has a Location value that is a town in England (London, Leeds), and the others do not, the others are the ones to be deleted.

A S X
No Location Cost
2 Glasgow 200
2 London 200
2 Leeds 200
2 Glasgow 100

returns

A S X
No Location Cost
2 London 200
2 Leeds 200
2 Glasgow 100

Thanks in advance for any suggestions!

mdmackillop
07-12-2017, 05:01 AM
Can you post a workbbok showing before and after and any clarification. Go Advanced/Manage Attachments

Edmon
07-13-2017, 04:08 AM
Let me try again, I have probably confused myself.


In the example attached Where the Location value is a town in England (London, Leeds), duplicates should be kept. Where the value in the Cost column is different, duplicates should be kept. If one of the duplicates has a Type value =Working, and the others do not, then the Working, and other duplicate do not, than the other duplicates are the ones to be deleted.

Please see attached workbook for a better illustration.

Thank you

mdmackillop
07-13-2017, 04:35 AM
Why delete Row 9 instead of 13?
Do you have a list of England towns in your workbook?

Edmon
07-13-2017, 06:05 AM
Why delete Row 9 instead of 13?
Do you have a list of England towns in your workbook?

Row 13 has a different cost value.

Also, the rows are longer than in the example, but the rest of the values in other columns do not matter for sorting purposes.

I can either have a list in the workbook itself, or type it into VBA...

mdmackillop
07-13-2017, 06:18 AM
Row 13 has a different cost value.
They are both different! You need rules. Is it the first, last, greatest or lowest? What if there are 2 @ 200 and 1 @ 100? A good sample of data should encompass your actual case to avoid misunderstanding.

Edmon
07-13-2017, 07:30 AM
Yes, sorry, they are, but row 9 is deleted because it 1) duplicates the record number 2) does not have a different location with a town in England 3) duplicates the cost value. Row 13 is kept because it does not satisfy 3).

First, duplicate record numbers are identified, and all subsequent operations are performed among duplicate record numbers only. Duplicates that are locations outside England are merged. If the duplicates have location values both in England and outside England, those NOT in England are deleted. If duplicates only have location values in England, they are merged if they do NOT have a different cost value. If the duplicates have a different type value, those NOT with working type value are deleted.

Do these resemble rules?

Thank you very much for your time

mdmackillop
07-13-2017, 08:06 AM
Duplicates that are locations outside England are merged.
I don't understand this. Please expand your sample to demonstrate.

Edmon
07-14-2017, 05:57 AM
Here I expanded it and tried putting it in some sort of logical structure.

Again, massive thank you for persistance.

Please see the file sample workbook.xlsx (32.6 KB), found a mistake in the other one.

mdmackillop
07-14-2017, 06:15 AM
Should Glasgow not show in the result?

Edmon
07-14-2017, 03:31 PM
Yes, it should not show, it is eliminated by rule 1). Belfast does show, however, since rule 1) does not apply to duplicate record number 5 - only rule 2) does.

mdmackillop
07-14-2017, 04:22 PM
Before coding is my interpretation correct? How about Row 18?

Edmon
07-15-2017, 12:51 AM
Please see attached for clarifications, thank you!

mdmackillop
07-15-2017, 03:42 AM
Try this

Edmon
07-15-2017, 10:49 AM
Runs perfectly on Windows, excellent, thank you very much!

On Mac it Returns error 429 'ActiveX component can't create object'. Debug points to code line Set dic = CreateObject("Scripting.Dictionary")

Any suggestions for Mac users?

Also, for learning purposes, which part of the code refers to finding duplicate record numbers?

mdmackillop
07-15-2017, 11:35 AM
This section of code creates range areas stored in a dictionary object. These areas are then passed to the other sub for processing.

For Each c In r
If Not dic.exists(c.Value) Then
dic.Add c.Value, c
Else
Set rTemp = dic(c.Value)
Set rTemp = Union(rTemp, c)
Set dic(c.Value) = rTemp
End If
Set rTemp = dic(c.Value)
'rTemp.Select
Next

Edmon
08-02-2017, 06:33 AM
Can't wrap my head around this bit


'Test 2
'Delete repeated occurences over 3 columns
For i = 1 To r.Rows.Count
For j = i + 1 To r.Rows.Count
r(i, 1).Interior.ColorIndex = 6 'debug
r(j, 1).Interior.ColorIndex = 7 'debug
If r(i, 1) & r(i, 2) & r(i, 3) = r(j, 1) & r(j, 2) & r(j, 3) Then r(j, 5) = "x"
Range("F:F").Interior.ColorIndex = xlNone 'debug
Next j
Next i
End Sub


What is the purpose of the .Interior.ColorIndex lines in this?

mdmackillop
08-02-2017, 06:52 AM
I commonly use colour to show me the cells being processed as the code runs and to demonstrate the affected cells. The lines marked "Debug" can be deleted.

Edmon
08-03-2017, 02:36 AM
According to rule 3: IF all duplicates have the same B (location) value, and the same D (cost) value, only the first record is kept (highest on the list).

Why is then that the line from the section above is


If r(i, 1) & r(i, 2) & r(i, 3) = r(j, 1) & r(j, 2) & r(j, 3) Then r(j, 5) = "x"


and not


If r(i, 1) & r(i, 2) & r(i, 4) = r(j, 1) & r(j, 2) & r(j, 4) Then r(j, 5) = "x"


Also, why is the result Then r(j, 5) = "x"? What is the meaning of cell address r(j, 5)?

mdmackillop
08-03-2017, 04:31 AM
Runs perfectly on Windows, excellent, thank you very much!
Has this stopped working?


Why is then that the line from the section above is
If r(i, 1) & r(i, 2) & r(i, 3) = r(j, 1) & r(j, 2) & r(j, 3) Then r(j, 5) = "x"and not
If r(i, 1) & r(i, 2) & r(i, 4) = r(j, 1) & r(j, 2) & r(j, 4) Then r(j, 5) = "x"
Because

Column E (time) represents other columns that are irrelevant for this sorting.


Also, why is the result Then r(j, 5) = "x"? What is the meaning of cell address r(j, 5)?
It provides data for a filter.

You should step through the code using Watch Values and Debug.Print to send values to the Immediate window. Add comments to the code that detail what each step does. These are the steps I have to follow to understand complex code submitted by posters in order to respond