How to remove duplicates within duplication sets based on criteria?

**maupinsmason** · 11-22-2016, 11:27 PM

I want to know how to filter the duplicate sets in the "m/z" column [highlighted yellow so that I can easily differentiate between the different duplication sets] based on:

Keep the row within the duplication set if (TO BE FOLLOWED IN CHRONOLOGICAL ORDER):
1.) N/O value <= 2 (denoted in green)
2.) (N+S)/O value <= 1 (also denoted in green)
3.) keep the row with the highest H/C value that is <= 2.25

What I need to do with these tests (or rules) is use them to compare each duplicate row within the duplication set to find one "Last man standing."

For example: look at duplication set of row 35, 36, and 37. All three pass the N/O test, so all three of the rows advance to the 2nd test. Now, row 35 and 36 fail the 2nd test due to being bigger than 1, and row 37 passes because it is <= 1; meaning that row 35 & 36 will get deleted and row 37 will stay as the "last man standing."

You see, I can't just filter every row that fails a test within the entire workbook because some rows that fail a test may still be the "winner" because it passed the most tests compared to the others within its duplication set.

For Example: look at row 18, 19, and 20. Because row 18 failed the N/O test, it gets deleted. But, 19 and 20 pass so they both move on to the (N+S)/O test. Since row 19 failed the 2nd test and row 20 passed, that makes it the only row left, and therefore, it is declared the "winner" and stays as the "unique" duplicate. However, notice that row 20 failed the 3rd test with having a value greater than 2.25. This is ignored because row 20 beat row 18 and 19 before it could "compete" in the 3rd test.

So if I were to just delete all of the rows that failed a test, row 20 would get deleted too, even though it was supposed to stay as the "unique" duplicate because it was the "last man standing."

In addition, if the duplications WITHIN the duplication set fails a test simultaneously to whereas there is not a single "winner", then delete every row WITHIN that duplication set.

Example 1: Take a look at row 4, 5, and 6. All three pass the first test, so all three advance on to the second test; now all three fail the second test so all three get deleted and there is no "winner."

Example 2: Take a look at row 13, 14, and 15. Row 13 fails the first test, so its automatically eliminated. Row 14 and 15 pass the first test so they advance to the 2nd test; however, row 14 and 15 both fail the second test simultaneously and both get deleted, therefore, there is not a single "winner" for this duplication set.

Im thinking this has to be achieved through some type of VBA code, but any suggestions are greatly appreciated, thanks!

Moderator's Note: To all, Please read the entire thread before answering. This is a Merged thread and there is additional information of the OP's situation below.

Thread: How to remove duplicates within duplication sets based on criteria?

Thread Tools

Display

Threaded View

How to remove duplicates within duplication sets based on criteria?

Posting Permissions