PDA

View Full Version : How to remove duplicates within duplication sets based on criteria?



maupinsmason
11-22-2016, 11:27 PM
I want to know how to filter the duplicate sets in the "m/z" column [highlighted yellow so that I can easily differentiate between the different duplication sets] based on:

Keep the row within the duplication set if (TO BE FOLLOWED IN CHRONOLOGICAL ORDER):
1.) N/O value <= 2 (denoted in green)
2.) (N+S)/O value <= 1 (also denoted in green)
3.) keep the row with the highest H/C value that is <= 2.25

What I need to do with these tests (or rules) is use them to compare each duplicate row within the duplication set to find one "Last man standing."

For example: look at duplication set of row 35, 36, and 37. All three pass the N/O test, so all three of the rows advance to the 2nd test. Now, row 35 and 36 fail the 2nd test due to being bigger than 1, and row 37 passes because it is <= 1; meaning that row 35 & 36 will get deleted and row 37 will stay as the "last man standing."

You see, I can't just filter every row that fails a test within the entire workbook because some rows that fail a test may still be the "winner" because it passed the most tests compared to the others within its duplication set.

For Example: look at row 18, 19, and 20. Because row 18 failed the N/O test, it gets deleted. But, 19 and 20 pass so they both move on to the (N+S)/O test. Since row 19 failed the 2nd test and row 20 passed, that makes it the only row left, and therefore, it is declared the "winner" and stays as the "unique" duplicate. However, notice that row 20 failed the 3rd test with having a value greater than 2.25. This is ignored because row 20 beat row 18 and 19 before it could "compete" in the 3rd test.

So if I were to just delete all of the rows that failed a test, row 20 would get deleted too, even though it was supposed to stay as the "unique" duplicate because it was the "last man standing."

In addition, if the duplications WITHIN the duplication set fails a test simultaneously to whereas there is not a single "winner", then delete every row WITHIN that duplication set.

Example 1: Take a look at row 4, 5, and 6. All three pass the first test, so all three advance on to the second test; now all three fail the second test so all three get deleted and there is no "winner."

Example 2: Take a look at row 13, 14, and 15. Row 13 fails the first test, so its automatically eliminated. Row 14 and 15 pass the first test so they advance to the 2nd test; however, row 14 and 15 both fail the second test simultaneously and both get deleted, therefore, there is not a single "winner" for this duplication set.

Im thinking this has to be achieved through some type of VBA code, but any suggestions are greatly appreciated, thanks!



Moderator's Note: To all, Please read the entire thread before answering. This is a Merged thread and there is additional information of the OP's situation below.

Paul_Hossler
11-23-2016, 09:42 AM
Who is the winner here and why?

17675

SamT
11-23-2016, 10:15 AM
Two Rows pass all tests, and have the same h/c . Then what?

@ Paul
Row 39 = Last man standing. Yes-No?

Proposed Algorithm:

For each DuplicatesSet
For Each Test
If FailedRows.Count = Set.Count then
Delete Set
Exit For
Else
Delete FailedRows
End If
If Set.Count = 1 Then Exit For
Next Test
If Set.Count > 1 then Do What 'Two Rows with same h/c
Next Set


@ maupinsmason,
Please verify that the logic in the proposed algorithm matches your needs.

Paul_Hossler
11-23-2016, 03:14 PM
Play with this and let me know

The attachment has the code and the results




Option Explicit

Dim rData As Range

Sub DeleteSomeRows()
Dim rData1 As Range
Dim iRow As Long, iStart As Long, iEnd As Long
Dim aCounts() As Long, aChecks() As Long


'setup
Application.ScreenUpdating = False
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion
Set rData1 = rData.Cells(2, 1).Resize(rData.Rows.Count - 1, rData.Columns.Count)
With ActiveSheet.Sort
.SortFields.Clear
.SortFields.Add Key:=rData1.Columns(1), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SortFields.Add Key:=rData1.Columns(4), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
.SetRange rData
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With

'pass #1 - for a m/z, (all col 2 outside) OR (all col 3 outside) OR (all col 4 outside) or (any col 2 outside)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
If .Cells(iRow + 1, 1).Value = .Cells(iRow, 1).Value Then
iEnd = iEnd + 1
Else
Call CheckData1(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(1).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0


'pass #2 - for a m/z with 2+ rows, (ANY col 3 outside)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
If .Cells(iRow + 1, 1).Value = .Cells(iRow, 1).Value Then
iEnd = iEnd + 1
Else
Call CheckData2(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(1).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0



'pass #3 - for a m/z with 2+ rows, (ANY col 4 outside)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
If .Cells(iRow + 1, 1).Value = .Cells(iRow, 1).Value Then
iEnd = iEnd + 1
Else
Call CheckData3(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(1).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0


'pass #4 - for a m/z with 2+ rows, (largest col 4)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
If .Cells(iRow + 1, 1).Value = .Cells(iRow, 1).Value Then
iEnd = iEnd + 1
Else
Call CheckData4(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(1).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0

Application.ScreenUpdating = True
End Sub



Private Sub CheckData1(S As Long, E As Long)
Dim i As Long
Dim bAllFail As Boolean

With rData
'first test - all col 2 outside
bAllFail = True
For i = S To E
If .Cells(i, 2).Value <= 2# Then bAllFail = False
Next i

If bAllFail Then
For i = S To E
.Cells(i, 1).Value = True
Next i
Exit Sub
End If

'second test - all col 3 outside
bAllFail = True
For i = S To E
If .Cells(i, 3).Value <= 1# Then bAllFail = False
Next i

If bAllFail Then
For i = S To E
.Cells(i, 1).Value = True
Next i
Exit Sub
End If


'third test - all col 4 outside
bAllFail = True
For i = S To E
If .Cells(i, 4).Value <= 2.25 Then bAllFail = False
Next i

If bAllFail Then
For i = S To E
.Cells(i, 1).Value = True
Next i
Exit Sub
End If

'fourth test - ANY col 2 outside
For i = S To E
If .Cells(i, 2).Value > 2# Then .Cells(i, 1).Value = True
Next i

End With
End Sub


Private Sub CheckData2(S As Long, E As Long)
Dim i As Long

If S = E Then Exit Sub

With rData
'any col 3 outside
For i = S To E
If .Cells(i, 3).Value > 1# Then .Cells(i, 1).Value = True
Next i
End With
End Sub

Private Sub CheckData3(S As Long, E As Long)
Dim i As Long

If S = E Then Exit Sub

With rData
'any col 4 outside
For i = S To E
If .Cells(i, 4).Value > 2.25 Then .Cells(i, 1).Value = True
Next i
End With
End Sub

Private Sub CheckData4(S As Long, E As Long)
Dim i As Long
Dim m As Double

If S = E Then Exit Sub

With rData
m = 0#
For i = S To E
If .Cells(i, 4).Value > m Then m = .Cells(i, 4).Value
Next i
For i = S To E
If .Cells(i, 4).Value < m Then .Cells(i, 1).Value = True
Next i

End With
End Sub

maupinsmason
11-23-2016, 03:53 PM
I want to know how to filter the duplicate sets in the "m/z" column [highlighted yellow so that I can easily differentiate between the different duplication sets] based on:

Keep the row within the duplication set if (TO BE FOLLOWED IN CHRONOLOGICAL ORDER):
1.) N/O value <= 2 (denoted in green)
2.) (N+S)/O value <= 1 (also denoted in green)
3.) keep the row with the highest H/C value that is <= 2.25

What I need to do with these tests (or rules) is use them to compare each duplicate row within the duplication set to find one "Last man standing."

For example: look at duplication set of row 36, 37, and 38. All three pass the N/O test, so all three of the rows advance to the 2nd test. Now, row 36 and 37 fail the 2nd test due to being bigger than 1, and row 38 passes because it is <= 1; meaning that row 36 & 37 will get deleted and row 38 will stay as the "last man standing."

You see, I can't just filter every row that fails a test within the entire workbook because some rows that fail a test may still be the "winner" because it passed the most tests compared to the others within its duplication set.

For Example: look at row 22, 23, and 24. Because 22, 23 and 24 all pass the N/O test, they advance to the (N+S)/O test. Since row 22 & 23 failed; and row 24 passed, that makes row 24 the only row left, and therefore, it is declared the “last man standing” and stays as the "unique duplicate.” However, notice that the winner row 24 failed the 3rd test with having a value greater than 2.25. This is ignored because row 24 beat row 22 and 23 before it could "compete" in the 3rd test.

So, if I were to just delete all of the rows that failed a test, row 24 would get deleted too, even though it was supposed to stay as the "unique duplicate” because it was the "last man standing."

In addition, if the duplications WITHIN the duplication set fails a test simultaneously to whereas there is not a single "winner", then delete every row WITHIN that duplication set.

Example 1: Take a look at row 18 and 19. Both of them pass the first test, so both of them advance on to the second test. Now both of them fail the second test so row 18 and 19 get deleted and there is no "winner."

Example 2: Take a look at row 4, 5, and 6. Row 4 fails the first test, so row 4 automatically gets eliminated. Row 5 and 6 pass the first test so they advance to the 2nd test; however, row 5 and 6 both fail the second test simultaneously and both get deleted, therefore, there is not a single "winner" for this duplication set.

Moreover, there will be some duplicates within the duplication sets that will pass all three tests, and wind up with the same H/C value.

So, in the event of this happening, I need to put them through 2 Tiebreakers (Also needs to be followed in chronological order):

Tiebreakers:

1.) If the value within the "P" column of the duplicates is a non-zero digit, then the duplicate must contain a H/C value within the range: 1.5 <= # <= 2.25 (if the duplicate has a zero in the “P” column, then void the tiebreaker #1 and move straight to tiebreaker #2).


For Example: Take a look at the duplication set of row 29 and 30. Both of them pass the 1st, 2nd and 3rd, while still having the same H/C value, but both of them have a zero in the “P” column, so they will not be tested with the Tiebreaker #1. Instead they will jump straight to Tiebreaker #2:

2.) Keep the duplicate that has the value closest to zero within the "stddev2" column.

For Example: Using the same duplication set of row 29 and 30, take a look at the “stddev2” column. Because row 29 has -0.8 and row 30 has a “stddev2” value of 1, row 29 will be declared the “winner.” This is because -0.8 is closer to zero than 1 is.

However, if there are still duplicates that passed tiebreaker #1 or jumped straight to tiebreaker #2, and they have the same "stddev2" value as well, then completely delete all of the duplicates within that duplication set, and move on to the next set.

I’m thinking this has to be achieved through some type of VBA code, but any suggestions are greatly appreciated, thanks!

Paul_Hossler
11-24-2016, 08:09 AM
I added the new TieBreaker logic to my macro in the previous post you started

I think the logic is what you asked for

Let me know

The attachment has the macro and the Results






Option Explicit
Const cMZ As Long = 2
Const cP As Long = 10
Const cStdDev2 As Long = 12
Const cNO As Long = 14
Const cNSO As Long = 15
Const cHC As Long = 16
Dim rData As Range
Sub DeleteSomeRows()
Dim rData1 As Range
Dim iRow As Long, iStart As Long, iEnd As Long
Dim aCounts() As Long, aChecks() As Long


'setup
Application.ScreenUpdating = False
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion
Set rData1 = rData.Cells(2, 1).Resize(rData.Rows.Count - 1, rData.Columns.Count)
Application.StatusBar = "Sorting"
With ActiveSheet.Sort
.SortFields.Clear
.SortFields.Add Key:=rData1.Columns(cMZ), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SortFields.Add Key:=rData1.Columns(cHC), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
.SetRange rData
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With

'pass #1 - for a m/z, (all col 2 outside) OR (all col 3 outside) OR (all col 4 outside) or (any col 2 outside)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #1 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
Call CheckData1(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0


'pass #2 - for a m/z with 2+ rows, (ANY col 3 outside)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #2 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
Call CheckData2(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0



'pass #3 - for a m/z with 2+ rows, (ANY col 4 outside)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #3 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
Call CheckData3(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0


'pass #4 - for a m/z with 2+ rows, (largest col 4)
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #4 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
Call CheckData4(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0

'tiebreaker #1 -
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Tiebreaker - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
Call TieBreaker1(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0

Application.StatusBar = False
Application.ScreenUpdating = True
End Sub



Private Sub CheckData1(S As Long, E As Long)
Dim i As Long
Dim bAllFail As Boolean

With rData
'first test - all col 2 outside
bAllFail = True
For i = S To E
If .Cells(i, cNO).Value <= 2# Then bAllFail = False
Next i

If bAllFail Then
For i = S To E
.Cells(i, cMZ).Value = True
Next i
Exit Sub
End If

'second test - all col 3 outside
bAllFail = True
For i = S To E
If .Cells(i, cNSO).Value <= 1# Then bAllFail = False
Next i

If bAllFail Then
For i = S To E
.Cells(i, cMZ).Value = True
Next i
Exit Sub
End If


'third test - all col 4 outside
bAllFail = True
For i = S To E
If .Cells(i, cHC).Value <= 2.25 Then bAllFail = False
Next i

If bAllFail Then
For i = S To E
.Cells(i, cMZ).Value = True
Next i
Exit Sub
End If

'fourth test - ANY col 2 outside
For i = S To E
If .Cells(i, cNO).Value > 2# Then .Cells(i, cMZ).Value = True
Next i

End With
End Sub


Private Sub CheckData2(S As Long, E As Long)
Dim i As Long

If S = E Then Exit Sub

With rData
'any col 3 outside
For i = S To E
If .Cells(i, cNSO).Value > 1# Then .Cells(i, cMZ).Value = True
Next i
End With
End Sub

Private Sub CheckData3(S As Long, E As Long)
Dim i As Long

If S = E Then Exit Sub

With rData
'any col 4 outside
For i = S To E
If .Cells(i, cHC).Value > 2.25 Then .Cells(i, cMZ).Value = True
Next i
End With
End Sub

Private Sub CheckData4(S As Long, E As Long)
Dim i As Long
Dim m As Double

If S = E Then Exit Sub

With rData
m = 0#
For i = S To E
If .Cells(i, cHC).Value > m Then m = .Cells(i, cHC).Value
Next i
For i = S To E
If .Cells(i, cHC).Value < m Then .Cells(i, cMZ).Value = True
Next i

End With
End Sub


Private Sub TieBreaker1(S As Long, E As Long)
Dim bAllZeros As Boolean, bOneFailed As Boolean
Dim i As Long
Dim m As Double

If S = E Then Exit Sub

bAllZeros = True
With rData
For i = S To E
If .Cells(i, cP).Value <> 0# Then bAllZeros = False
Next i

If bAllZeros Then
m = 10 ^ 6
For i = S To E
If Abs(.Cells(i, cStdDev2).Value) < m Then m = Abs(.Cells(i, cStdDev2).Value)
Next i

If m <> 10 ^ 6 Then
For i = S To E
If Abs(.Cells(i, cStdDev2).Value) <> m Then .Cells(i, cMZ).Value = True
Next i

Else
For i = S To E
.Cells(i, cMZ).Value = True
Next i
End If

Else
bOneFailed = False
For i = S To E
If .Cells(i, cP).Value <> 0# Then
If (.Cells(i, cHC).Value < 1.5) Or (.Cells(i, cHC).Value > 2.25) Then
.Cells(i, cMZ).Value = True
bOneFailed = True
End If
End If
Next i

If Not bOneFailed Then
For i = S To E
.Cells(i, cMZ).Value = True
Next i
End If

End If

End With
End Sub

Paul_Hossler
11-25-2016, 07:15 AM
Let me know how this works out for you

I did it with 5 simple passes instead of one huge complicated one because 1) That's my style; 2) it was easier to develop and test; but 3) it's much easier to correct or add anything

SamT
11-25-2016, 12:20 PM
The OP started two identical threads with replies in both.

This tread is those two threads merged into one.

Posts are out of Thread order because they are in chronological order regardless of the original thread they were in.



To ALL: PLease do not dupicate your questions in two threads. Thank you.

maupinsmason
11-26-2016, 06:45 PM
Holy Moly Paul... That is amazing! Thank you so much for helping me out so much!

For the most part it actually looks correct, except I found a few that should have passed and did not end up in the "Results" sheet you created.

Unfortunately, I am not fluent in coding language (I can get the general idea of what is going on in the code, but I can't edit the code because I simply don't know it that well), so what I did was I went through the sheet I originally attached and processed it manually in the way I want them to be processed. I attached a new and updated "Maupinsmason's Sample Workbook" where I created another sheet that contains what the results should look like if the rules were followed the way I want them too.

Please let me know if there is anything else I can help with.

Thank you so much,

Mason

Paul_Hossler
11-27-2016, 12:50 PM
A rule wasn't 100% correct 100% of the time

I also simplified the code a bit so you could make any changes you needed, as well as make it a little more self documenting



Option Explicit

Const cMZ As Long = 2
Const cP As Long = 10
Const cStdDev2 As Long = 12
Const cNO As Long = 14
Const cNSO As Long = 15
Const cHC As Long = 16

Dim rData As Range

Sub DeleteSomeRows()
Dim rData1 As Range
Dim iRow As Long, iStart As Long, iEnd As Long
Dim aCounts() As Long, aChecks() As Long


'testing to init data
' Worksheets("HA_MM_7_0915_NB_MF_proc_reduced").Columns("B:P").Copy
' Sheets("Results").Select
' Range("B1").Select
' ActiveSheet.Paste
'------------------------------------------------

'setup - sort in MZ ascending order
Application.ScreenUpdating = False
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion
Set rData1 = rData.Cells(2, 1).Resize(rData.Rows.Count - 1, rData.Columns.Count)
Application.StatusBar = "Sorting"
With ActiveSheet.Sort
.SortFields.Clear
.SortFields.Add Key:=rData1.Columns(cMZ), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SetRange rData
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With

'---------------------------------------------------------------------------
'pass #1 - for a MZ with 2+ rows, delete if
' a NO and NSO outside on row
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #1 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
If iStart <> iEnd Then Call CheckData1(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0
'---------------------------------------------------------------------------
'pass #2 - for a MZ with 2+ rows, delete if
' a NO not outside and NSO outside on row
' b NO outside and NSO not outside on row
' c All rows have either NO or NSO outside
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #2 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
If iStart <> iEnd Then Call CheckData2(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0

'---------------------------------------------------------------------------
'pass #3 - for a MZ with 2+ rows, delete if
' a HC outside
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #3 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
If iStart <> iEnd Then Call CheckData3(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0
'---------------------------------------------------------------------------
'pass #4 - for a MZ with 2+ rows, delete
' a not the largest HC
iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #4 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
If iStart <> iEnd Then Call CheckData4(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0
'---------------------------------------------------------------------------
'pass #5 - tiebreaker
'1.) If the value within the "P" column of the duplicates is a non-zero digit, then the duplicate must contain a H/C value
' within the range: 1.5 <= # <= 2.25
' (if the duplicate has a zero in the “P” column, then void the tiebreaker #1 and move straight to tiebreaker #2).
'
' 2.) Keep the duplicate that has the value closest to zero within the "stddev2" column.
'
' if there are still duplicates that passed tiebreaker #1 or jumped straight to tiebreaker #2, and
' they have the same "stddev2" value as well, then completely delete all of the duplicates

iStart = 2
iEnd = iStart
With rData
For iRow = 2 To .Rows.Count
Application.StatusBar = "Pass #5 - Row " & iRow
If .Cells(iRow + 1, cMZ).Value = .Cells(iRow, cMZ).Value Then
iEnd = iEnd + 1
Else
If iStart <> iEnd Then Call CheckData5(iStart, iEnd)
iStart = iRow + 1
iEnd = iStart
End If
Next iRow
End With
On Error Resume Next
rData.Columns(cMZ).SpecialCells(xlCellTypeConstants, xlLogical).EntireRow.Delete
On Error GoTo 0
Application.StatusBar = False
Application.ScreenUpdating = True

MsgBox "Done"
End Sub

'--------------------------------------------------------------------
Private Function RowIsMarked(i As Long) As Boolean
RowIsMarked = Application.WorksheetFunction.IsLogical(rData.Cells(i, cMZ).Value)
End Function
Private Sub MarkRow(i As Long)
rData.Cells(i, cMZ).Value = True
End Sub
Private Function NO_Outside(i As Long) As Boolean
NO_Outside = rData.Cells(i, cNO).Value > 2#
End Function
Private Function NSO_Outside(i As Long) As Boolean
NSO_Outside = rData.Cells(i, cNSO).Value > 1#
End Function
Private Function HC_Outside(i As Long) As Boolean
HC_Outside = rData.Cells(i, cHC).Value > 2.25
End Function
Private Function All_NO_Outside(S As Long, E As Long) As Boolean
Dim i As Long

All_NO_Outside = True

For i = S To E
If Not NO_Outside(i) Then
All_NO_Outside = False
Exit Function
End If
Next i
End Function
Private Function All_NSO_Outside(S As Long, E As Long) As Boolean
Dim i As Long

All_NSO_Outside = True

For i = S To E
If Not NSO_Outside(i) Then
All_NSO_Outside = False
Exit Function
End If
Next i
End Function

'------------------------------------------------------------------------
Private Sub CheckData1(S As Long, E As Long)
Dim i As Long

With rData
For i = S To E
If NO_Outside(i) And NSO_Outside(i) Then MarkRow (i)
Next i
End With
End Sub


'any NO outside
Private Sub CheckData2(S As Long, E As Long)
Dim i As Long
Dim bAllRowsOutside As Boolean

With rData
For i = S To E
If NSO_Outside(i) And Not NO_Outside(i) Then
MarkRow (i)
ElseIf Not NSO_Outside(i) And NO_Outside(i) Then
MarkRow (i)
End If
Next i


bAllRowsOutside = True
For i = S To E
If Not (NSO_Outside(i) And NO_Outside(i)) Then
bAllRowsOutside = False
Exit For
End If
Next i

If bAllRowsOutside Then
For i = S To E
MarkRow (i)
Next i
End If
End With
End Sub

Private Sub CheckData3(S As Long, E As Long)
Dim i As Long

With rData
For i = S To E
If HC_Outside(i) Then MarkRow (i)
Next i
End With
End Sub

Private Sub CheckData4(S As Long, E As Long)
Dim i As Long
Dim m As Double

With rData
m = 0#
For i = S To E
If .Cells(i, cHC).Value > m Then m = .Cells(i, cHC).Value
Next i
For i = S To E
If .Cells(i, cHC).Value < m Then MarkRow (i)
Next i

End With
End Sub


Private Sub CheckData5(S As Long, E As Long)
Dim i As Long, j As Long, n As Long, n1 As LoadPictureConstants
Dim m As Double

If S = E Then Exit Sub

n = E - S + 1


With rData
For i = S To E
If .Cells(i, cP).Value <> 0# Then
For j = S To E
If .Cells(j, cP).Value <> 0# Then
If (.Cells(j, cHC).Value < 1.5) Or (.Cells(j, cHC).Value > 2.25) Then
.Cells(j, cMZ).Value = True
n = n - 1
End If
End If
Next j
End If
Next i

If n = 1 Then Exit Sub


m = 10 ^ 6
For i = S To E
If Not Application.WorksheetFunction.IsLogical(.Cells(i, cMZ).Value) Then
If Abs(.Cells(i, cStdDev2).Value) < m Then m = Abs(.Cells(i, cStdDev2).Value)
End If
Next i

n1 = 0
For i = S To E
If Not Application.WorksheetFunction.IsLogical(.Cells(i, cMZ).Value) Then
If Abs(.Cells(i, cStdDev2).Value) = m Then n1 = n1 + 1
End If
Next i

If n1 = 1 Then
For i = S To E
If Not Application.WorksheetFunction.IsLogical(.Cells(i, cMZ).Value) Then
If Abs(.Cells(i, cStdDev2).Value) <> m Then
.Cells(i, cMZ).Value = True
End If
End If
Next i

Else
For i = S To E
.Cells(i, cMZ).Value = True
Next i
End If

End With
End Sub

maupinsmason
11-27-2016, 11:19 PM
WOW! This works flawlessly!! Thank you so much Paul! I owe you HUGE man!

I have one question though, if I had to add more columns such as column K in the new spreadsheet I attached, how would I change the code to accommodate the change?

I changed the position of the columns in the beginning of your code to:



Option Explicit

Const cMZ As Long = 2
Const cP As Long = 10
Const cStdDev2 As Long = 13
Const cNO As Long = 15
Const cNSO As Long = 16
Const cHC As Long = 17

Dim rData As Range

Sub DeleteSomeRows()
Dim rData1 As Range
Dim iRow As Long, iStart As Long, iEnd As Long
Dim aCounts() As Long, aChecks() As Long


'testing to init data
' Worksheets("HA_MM_7_0915_NB_MF_proc_reduced").Columns("B:P").Copy
' Sheets("Results").Select
' Range("B1").Select
' ActiveSheet.Paste
'------------------------------------------------



However, when I ran the code with the new spread sheet, it gave me this error:


17710


When I press "OK" the VBA window comes up with this:



17711

Any suggestions on how to fix this? I'm sure it's a simple fix, I just wanted to make sure though.

Thanks

Mason

Paul_Hossler
11-28-2016, 06:56 AM
Ooops - that usually happens when I'm relying on the Intellisense to complete something, and I'm off by one in the list

It should be 'Long', not 'LoadPictureConstants'

Funny thing is that it didn't affect my test runs so the change must have crept in somewhere later :think:, or it was a compatible data type since I'm using Excel 2016 and 'LoadPictureConstants' is part of the stdolb2.tlb OLE Reference which is probably why it worked

Don't worry about that, just change it to Long and it should work fine

Le me know if it doesn't