How to remove duplicates within duplication sets based on criteria?

**maupinsmason** · 11-23-2016, 03:53 PM

I want to know how to filter the duplicate sets in the "m/z" column [highlighted yellow so that I can easily differentiate between the different duplication sets] based on:

Keep the row within the duplication set if (TO BE FOLLOWED IN CHRONOLOGICAL ORDER):
1.) N/O value <= 2 (denoted in green)
2.) (N+S)/O value <= 1 (also denoted in green)
3.) keep the row with the highest H/C value that is <= 2.25

What I need to do with these tests (or rules) is use them to compare each duplicate row within the duplication set to find one "Last man standing."

For example: look at duplication set of row 36, 37, and 38. All three pass the N/O test, so all three of the rows advance to the 2nd test. Now, row 36 and 37 fail the 2nd test due to being bigger than 1, and row 38 passes because it is <= 1; meaning that row 36 & 37 will get deleted and row 38 will stay as the "last man standing."

You see, I can't just filter every row that fails a test within the entire workbook because some rows that fail a test may still be the "winner" because it passed the most tests compared to the others within its duplication set.

For Example: look at row 22, 23, and 24. Because 22, 23 and 24 all pass the N/O test, they advance to the (N+S)/O test. Since row 22 & 23 failed; and row 24 passed, that makes row 24 the only row left, and therefore, it is declared the “last man standing” and stays as the "unique duplicate.” However, notice that the winner row 24 failed the 3rd test with having a value greater than 2.25. This is ignored because row 24 beat row 22 and 23 before it could "compete" in the 3rd test.

So, if I were to just delete all of the rows that failed a test, row 24 would get deleted too, even though it was supposed to stay as the "unique duplicate” because it was the "last man standing."

In addition, if the duplications WITHIN the duplication set fails a test simultaneously to whereas there is not a single "winner", then delete every row WITHIN that duplication set.

Example 1: Take a look at row 18 and 19. Both of them pass the first test, so both of them advance on to the second test. Now both of them fail the second test so row 18 and 19 get deleted and there is no "winner."

Example 2: Take a look at row 4, 5, and 6. Row 4 fails the first test, so row 4 automatically gets eliminated. Row 5 and 6 pass the first test so they advance to the 2nd test; however, row 5 and 6 both fail the second test simultaneously and both get deleted, therefore, there is not a single "winner" for this duplication set.

Moreover, there will be some duplicates within the duplication sets that will pass all three tests, and wind up with the same H/C value.

So, in the event of this happening, I need to put them through 2 Tiebreakers (Also needs to be followed in chronological order):

Tiebreakers:

1.) If the value within the "P" column of the duplicates is a non-zero digit, then the duplicate must contain a H/C value within the range: 1.5 <= # <= 2.25 (if the duplicate has a zero in the “P” column, then void the tiebreaker #1 and move straight to tiebreaker #2).

For Example: Take a look at the duplication set of row 29 and 30. Both of them pass the 1^st, 2^nd and 3^rd, while still having the same H/C value, but both of them have a zero in the “P” column, so they will not be tested with the Tiebreaker #1. Instead they will jump straight to Tiebreaker #2:

2.) Keep the duplicate that has the value closest to zero within the "stddev2" column.

For Example: Using the same duplication set of row 29 and 30, take a look at the “stddev2” column. Because row 29 has -0.8 and row 30 has a “stddev2” value of 1, row 29 will be declared the “winner.” This is because -0.8 is closer to zero than 1 is.

However, if there are still duplicates that passed tiebreaker #1 or jumped straight to tiebreaker #2, and they have the same "stddev2" value as well, then completely delete all of the duplicates within that duplication set, and move on to the next set.

I’m thinking this has to be achieved through some type of VBA code, but any suggestions are greatly appreciated, thanks!

Thread: How to remove duplicates within duplication sets based on criteria?

Thread Tools

Display

Threaded View

How to remove duplicates within duplication sets based on criteria?

Posting Permissions