PDA

View Full Version : Search for row, check and count max match with another range



sans.s
05-11-2012, 12:35 AM
Hello,

I am encountering the following problem below and hope there is a vba solution that can help me. Currently I am processing all data manually with formulae and this is taking me countless hours to get to the desired results.

I would like to search for a row (according to values in a range) and count the maximum match that can be achieved comparing the found row with another range.

I am attaching an example worksheet. I tried to make the example as clear as possible but please do let me know if I am not being clear at any point.

Thank you for your time,
Sans

Bob Phillips
05-11-2012, 02:56 AM
I can't figure out your logic. When you have a row where 3, 8 and 9 are all found in W$2:AC42 you say you use 8. Why?

sans.s
05-11-2012, 03:18 AM
Hi xld,

It is possible to choose any one number out of the three. However, only 1 number can be chosen from each row in Data A and as long as that number has not been chosen before. The tricky part is that in the end, the result is the max possible match that can be achieved. Have a look at Note3 as I tried to make this more clear.

Thank you,
Sans

Bob Phillips
05-11-2012, 04:02 AM
I see what you are saying, but the results can be very different.

In your examples, you ended up selecting 15,4,8,3,9. The code I wrote selected 15,4,9,3, so in the third process it selected 9, 8 never figured again, so I got 4 numbers matching whereas you got 5.

Bob Phillips
05-11-2012, 04:02 AM
I suppose you could save every matching number then remove duplicates, but that way you may pick 2 from the same row.

sans.s
05-11-2012, 04:16 AM
Yes, that's the tricky part. To get the max possible match by choosing the appropriate number from each row.

Bob Phillips
05-11-2012, 04:51 AM
The only way I can see of doing that is to capture each match in a row, then for each match, simulate processing all of the rest of the row to see if that match is used again, if so discard it. This could take a looooong time, and you still have the problem that if a row has two numbers that are not covered later, which do you choose?

Bob Phillips
05-11-2012, 04:58 AM
Actually it is worse, because when you do that look-ahead to see if a match is used later, if where it is/could be used later also has multiple matches, then you have to look-ahead there as well to see if they could be used later, and if so, you have to look ahead there, and so on and so on.

I think you would have to capture all matches first in a 7x7 array, and then apply some logic there to outsort them. In your example, you would get

3 15
4 3 15
8 9 3
3
9 3 9
4 9

Maybe you could then process that, in a recursive loop, but even here the first row is a problem. 3 could be discarded because there is a row with 3 on its own but 15 could also be discarded, because it is on row 6 with a 9, and the 9 could be discarded from earlier use. So which do you discard? What is the logic that you apply manually?

sans.s
05-11-2012, 05:37 AM
About the first question in post #7, if a row has two numbers that are not covered later, it doesn't make a difference which we should choose.


Also this is what I do manually. If these are the numbers that matched

3 15
4 3 15
8 9 3
3
9 3
4 9

what I would then do manually is create all possible combinations by using 1 number from each row and then remove any combinations that have any number more than once. Above, one of these combinations left would be,
15-4-8-3-9

:)

Bob Phillips
05-11-2012, 05:59 AM
Okay, let me play and see what I can do. As I said, my first attempt got 15-4-9-3, so it needs work :(

sans.s
05-11-2012, 07:09 AM
Thank you :)

Bob Phillips
05-11-2012, 10:37 AM
Here is my best attempt to date.

Not sure it works in all scenarios, but I applied the logic I followed doing it manually. I am also not happy with the code, it was a bit Heath-Robinson, iterative coding, but let's see

Public Sub CheckData()
Dim sh As Worksheet
Dim rngData As Range
Dim rngOffsets As Range
Dim rngCheck As Range
Dim rngLookup As Range
Dim vecChecked(1 To 7) As Long
Dim mtxChecked(1 To 7, 1 To 7) As Long
Dim posChecked As Long
Dim idxChecked As Long
Dim i As Long, ii As Long

With Worksheets("Example")

Set rngData = .Range("B23:H42")
Set rngOffsets = .Range("J42:P42")
Set rngCheck = .Range("W42:AC42")

idxChecked = 0
For i = 1 To 7

Set rngLookup = rngData.Rows(rngData.Rows.Count - rngOffsets.Cells(1, i).Value + 1)
For ii = 1 To 7

posChecked = 0
On Error Resume Next
posChecked = Application.Match(rngLookup.Cells(1, ii).Value, rngCheck, 0)
On Error GoTo 0
If posChecked > 0 Then

mtxChecked(i, ii) = rngLookup.Cells(1, ii).Value
End If
Next ii
Next i

MsgBox ExtractCheckedItems(mtxChecked)
End With
End Sub

Private Function ExtractCheckedItems(ByRef mtxChecked As Variant) As String
Dim sh As Worksheet
Dim minChecked As Long
Dim posChecked As Long
Dim i As Long, ii As Long

Set sh = Worksheets.Add
With sh

.Range("A1:G7").Value = mtxChecked
For i = 1 To 7

If Application.CountIf(.Range("A1:G7").Rows(i), "<>0") > 0 Then

For ii = 1 To 7

If .Range("A1:G7").Cells(i, ii).Value <> 0 Then

.Cells(i, ii + 10).Value = Application.CountIf(.Range("A1:G7"), mtxChecked(i, ii))
End If
Next ii
End If
Next i

For i = 1 To 7

posChecked = Application.Min(.Range("K1:Q7").Rows(i))
If posChecked > 0 Then

ii = Application.Match(posChecked, .Range("K1:Q7").Rows(i), 0)

minChecked = Application.Index(.Range("A1:G7"), i, ii)
If IsError(Application.Match(minChecked, .Range("I1").Resize(i), 0)) Then

.Range("I1:I7").Cells(i, 1).Value = minChecked
End If
End If
Next i

ExtractCheckedItems = Join(Application.Transpose(.Range("I1:I7")), "-")
End With

Application.DisplayAlerts = False
sh.Delete
Application.DisplayAlerts = True
End Function

sans.s
05-11-2012, 11:36 AM
Thank you very very much xld :)

A couple of quick questions so that I know how to backtest it.

Does the macro check both ranges (J41:P41 & J42:P42) as in the attachment above and then displays the highest match of the two? Or does it check only the latest J42:P42?

Also I see that the macro displays the result which is great! Does it also output the highest match in AE42, i.e.5 in this case or am I doing something wrong?

Thank you very much

Bob Phillips
05-11-2012, 12:12 PM
Currently, it only checks J42:P42. It could easily be changed to check both, but are the results to be combined, or a separate set for each row?

Count, no, but easily implemented (trivial in comparison to working out what is included).

I am intrigued, what can this possibly be all about?

sans.s
05-12-2012, 02:43 AM
No problem :)

http://en.wikipedia.org/wiki/Mathematical_statistics

I am required to experiment as part of attending a short intensive course program, the wiki page above explains what its all about. :)

I am attaching another small sheet where I tried the macro. In this case the macro outputs

1-10-11-12-4

which is a match of 5 values but the result that can be achieved is 7. I've used "simple" values so is very clear to detect them.

The best method I've found to achieve the desired results so far is as I mentioned above. Basically to create all combinations by using only one value and then discard any combinations that include duplicate values. Then the results are compared with W42:AC42 for the best match. Of course I don't know if this is at all possible or how long it will take to create such an algorithm.

Thank you for your help :)

Bob Phillips
05-13-2012, 04:14 AM
I got a working version yesterday, but for a match on 7 items it took forever (i mean hours).

I rationalised the code today and whilst it is much faster, 7 item match still takes some 20 minutes on my netbook. Is that acceptable?

sans.s
05-13-2012, 05:43 AM
That's amazing xld!

The first priority is for the results to be correct. Figuring out the algo is exceptional in itself. I do have thousands of rows to process, so if its possible to get the processing time down it will be even better, if not, that is great too! :)

Thank you

Bob Phillips
05-13-2012, 08:21 AM
Reading what you have just said, I have realised I may have made an (invalid?) assumption.

I have firstly assumed that you will be checking across an array of 7 values. I think that is a valid assumption, correct?

Because of the way you setup the test-data, I have also assumed that you will only be pulling back 7 entries from the data table (based upon the array J42:P42). Is this a valid assumption.

Anyway, here is that code. It takes two passes, the first builds a matrix of 7x values, using the rows pointed to by J42:P42, of the values in those rows that are matched in WC42:AC42. It then compiles all combinations from this 7x7 table, and determines the combination with the maximum matches, and spells out the details.

My first effort was pure brute force. I did a recursive loop taking an item and going through all combinations of that and the value and the next level, and so on, then the next item and all combinations of that value and the next level, and so on, and so on. I then rationalised it to populate each level one at a time, filling all of the values. This reduced the time for the 7x7 matrix 8 fold (on my desktop, it takes just over a minute).

Public Sub CheckData()
Const FORMULA_UNIQUE As String = _
"=SUMPRODUCT((A1:<col>1<>0)/(COUNTIF(A1:<col>1,A1:<col>1)))"
Const numRows As Long = 7
Dim shChecked As Worksheet
Dim shSeq As Worksheet
Dim rngData As Range
Dim rngOffsets As Range
Dim rngToCheck As Range
Dim rngLookup As Range
Dim rngMatches As Range
Dim msgDetails As String
Dim nTime As Double
Dim vecIdx As Variant
Dim mtxChecked(1 To 7, 1 To 7) As Long
Dim maxUniques As Long
Dim maxRow As Long
Dim posChecked As Long
Dim idxChecked As Long
Dim i As Long, ii As Long

nTime = Timer

With Application

.Calculation = xlCalculationManual
.ScreenUpdating = False
End With

With Worksheets("Example")

Set rngData = .Range("B23:H42")
Set rngOffsets = .Range("J42:P42")
Set rngToCheck = .Range("W42:AC42")

ReDim vecMultiple(1 To numRows) As Long
vecMultiple(numRows) = 0
For i = 1 To numRows - 1

vecMultiple(i) = numRows ^ (numRows - i)
Next i

idxChecked = 0
For i = 1 To numRows

Set rngLookup = rngData.Rows(rngData.Rows.Count - rngOffsets.Cells(1, i).Value + 1)
For ii = 1 To numRows

posChecked = 0
On Error Resume Next
posChecked = Application.Match(rngLookup.Cells(1, ii).Value, rngToCheck, 0)
On Error GoTo 0
If posChecked > 0 Then

mtxChecked(i, ii) = rngLookup.Cells(1, ii).Value
End If
Next ii
Next i
End With

Set shChecked = Worksheets.Add
Set rngMatches = shChecked.Range("A1").Resize(numRows, numRows)
With rngMatches

.Value = mtxChecked
For i = 1 To 7

If Application.CountIf(.Rows(i), "<>0") > 0 Then

For ii = 1 To numRows

If .Cells(i, ii).Value <> 0 Then

.Cells(i, ii + 10).Value = Application.CountIf(shChecked.Range("A1").Resize(numRows, numRows), mtxChecked(i, ii))
End If
Next ii
End If
Next i
End With

Set shSeq = Worksheets.Add

ReDim vecIdx(1 To numRows)
ReDim vexcChecked(1 To numRows)
For i = 1 To numRows

vecIdx(1) = i
Call LoadData(1, vecIdx, numRows, vecMultiple, shSeq, rngMatches)
Next i

With Application

.ScreenUpdating = True
.Calculation = xlCalculationAutomatic
End With

shSeq.Range("A1").Offset(, numRows + 1).Resize(numRows ^ numRows).Formula = Replace(FORMULA_UNIQUE, "<col>", Chr(numRows + 64))
maxUniques = Application.Max(shSeq.Columns("A").Offset(, numRows + 1))
maxRow = Application.Match(maxUniques, shSeq.Columns("A").Offset(, numRows + 1), 0)
msgDetails = "The best match gives " & maxUniques & " matches" & vbNewLine & vbNewLine & _
"The best match series is:" & vbNewLine & _
Join(Application.Transpose(Application.Transpose(shSeq.Columns("A").Cells(maxRow).Resize(, numRows))), "-") & vbNewLine & vbNewLine & _
"The job took " & Timer - nTime & " seconds"
MsgBox runVersion & vbNewLine & msgDetails, vbOKOnly, "Best Sequence"
End Sub

Private Sub LoadData( _
ByRef level As Long, _
ByRef idx As Variant, _
ByVal numRows As Long, _
ByRef multiples As Variant, _
ByRef sh As Worksheet, _
ByRef rng As Range)
Dim size As Long
Dim nextrow As Long
Dim i As Long, ii As Long

runVersion = "v2 "
nextrow = 1
size = numRows ^ (numRows - idx(1))
For i = 1 To numRows ^ (idx(1) - 1)

For ii = 1 To numRows

sh.Cells(nextrow, idx(1)).Resize(size, 1).Value = rng(idx(1), ii).Value
nextrow = nextrow + size
Next ii
Next i
End Sub

sans.s
05-13-2012, 10:13 AM
xld, the algo is outstanding and your skills are simply phenomenal! I just can't believe it! ∞ thank you!!!

I tested it out for a while and the results are absolutely 100% correct. I suspect the way the algo processes the data, there is no room for error. This is simply amazing as its exactly what I needed! :thumb

About the array question, if it would be possible, what the algorithm does with J42:P42 - do the same process for each of the 15 latest ranges in J:P, so, determine a match for each range in J:P, and then the best match amongst the 15 checked ranges is the end result.

I am attaching an example workbook. I hope it is clear what I mean.

Again, thank you very very very much for your help!

Bob Phillips
05-13-2012, 10:34 AM
Okay, I will take a look tomorrow.

sans.s
05-13-2012, 10:40 AM
Also, assuming it is doable what I described in the above attachment, would it be difficult to incorporate into the code the ability to select the ranges (in DATA, the J:P ranges, and the RANGE to CHECK) similar to that of using a formula? This is because for each range in W:AC I will need to check the 15 latest ranges, the 20 latest ranges, the 25 latest ranges and so on...This way I can set up big excel sheets at a time and leave them processing overnight.

I am still speechless and so happy with the code above. Thank you. :)

Bob Phillips
05-14-2012, 07:56 AM
I think this does as you describe

Option Explicit

Global Const APP_TITLE As String = "Best Sequence"
Global Const numRows As Long = 7
Global runVersion As String

Public Sub CheckData()
Dim shBase As Worksheet
Dim shChecked As Worksheet
Dim shSeq As Worksheet
Dim rngData As Range
Dim rngOffsets As Range
Dim rngToCheck As Range
Dim rngMatches As Range
Dim rngResults As Range
Dim nTime As Double
Dim mtxChecked As Variant
Dim i As Long, ii As Long, iii As Long

nTime = Timer

Set shBase = Worksheets("Example")
Set rngResults = shBase.Range("W3")
rngResults.Resize(39, 9).ClearContents

If GetData(shBase, rngData, rngOffsets, rngToCheck) Then

Set shChecked = Worksheets.Add
Set shSeq = Worksheets.Add

For i = 1 To rngOffsets.Rows.Count

With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False
End With

'build an array of matches for current check row
mtxChecked = GetMatches(shBase, rngData, rngOffsets.Rows(i), rngToCheck)

Set rngMatches = shChecked.Range("A1").Resize(numRows, numRows)
With rngMatches

.ClearContents
.Value = mtxChecked
End With

shSeq.Cells.ClearContents
For ii = 1 To numRows

Call LoadData(ii, numRows, shSeq, rngMatches)
Next ii

With Application

.ScreenUpdating = True
.Calculation = xlCalculationAutomatic
End With

Call OutputResults(shSeq, rngResults, i)
Next i
End If

Application.DisplayAlerts = False
On Error Resume Next
shSeq.Delete
shChecked.Delete
On Error GoTo 0
Application.DisplayAlerts = True

MsgBox "The job took " & Timer - nTime & " seconds", vbOKOnly, APP_TITLE
End Sub

Private Sub LoadData( _
ByRef idx As Long, _
ByVal numRows As Long, _
ByRef sh As Worksheet, _
ByRef rng As Range)
Dim size As Long
Dim nextrow As Long
Dim i As Long, ii As Long

runVersion = "v2 "

nextrow = 1
size = numRows ^ (numRows - idx)
For i = 1 To numRows ^ (idx - 1)

For ii = 1 To numRows

sh.Cells(nextrow, idx).Resize(size, 1).Value = rng(idx, ii).Value
nextrow = nextrow + size
Next ii
Next i
End Sub

Private Function GetData(ByRef sh As Worksheet, ByRef Data As Range, ByRef Offsets As Range, ByRef ToCheck As Range) As Boolean
Dim rng As Range

sh.Select
' Set Data = sh.Range("B23:H42")
' Set Offsets = sh.Range("J41:P42")
' Set ToCheck = sh.Range("W42:AC42")
' GetData = True
' Exit Function

Do

Set Data = Nothing
On Error Resume Next
Set Data = Application.InputBox("Select the 7 columns of DATA to be checked using the mouse", Type:=8)
On Error GoTo 0
If Data Is Nothing Then Exit Function

If Data.Columns.Count <> 7 Then

MsgBox "The DATA must be 7 columns wide", vbOKOnly, APP_TITLE
End If
Loop Until Data.Columns.Count = 7

Do

Set Offsets = Nothing
On Error Resume Next
Set Offsets = Application.InputBox("Select the 7 columns of J:P parameters using the mouse", Type:=8)
On Error GoTo 0
If Offsets Is Nothing Then Exit Function

If Offsets.Columns.Count <> 7 Then

MsgBox "The J:P must be 7 columns wide", vbOKOnly, APP_TITLE
End If
Loop Until Offsets.Columns.Count = 7

Do

Set ToCheck = Nothing
On Error Resume Next
Set ToCheck = Application.InputBox("Select the 1 row and 7 columns of 'Range To Check' using the mouse", Type:=8)
On Error GoTo 0
If ToCheck Is Nothing Then Exit Function

If ToCheck.Rows.Count <> 1 Or ToCheck.Columns.Count <> 7 Then

MsgBox "The 'Range To Check' must be 1 row high and 7 columns wide", vbOKOnly, APP_TITLE
End If
Loop Until ToCheck.Rows.Count = 1 And ToCheck.Columns.Count = 7

GetData = True
End Function

Private Function GetMatches(ByRef sh As Worksheet, ByRef Data As Range, ByRef Offsets As Range, ByRef ToCheck As Range) As Variant
Dim rngLookup As Range
Dim mtxLookup(1 To 7, 1 To 7) As Long
Dim posChecked As Long
Dim i As Long, ii As Long

With sh

For i = 1 To numRows

Set rngLookup = Data.Rows(Data.Rows.Count - Offsets.Cells(1, i).Value + 1)
For ii = 1 To numRows

posChecked = 0
On Error Resume Next
posChecked = Application.Match(rngLookup.Cells(1, ii).Value, ToCheck, 0)
On Error GoTo 0
If posChecked > 0 Then

mtxLookup(i, ii) = rngLookup.Cells(1, ii).Value
End If
Next ii
Next i
End With

GetMatches = mtxLookup
End Function

Private Function OutputResults(ByRef Seq As Worksheet, ByRef rng As Range, ByVal idx As Long) As Boolean
Const FORMULA_UNIQUE As String = _
"=SUMPRODUCT((A1:<col>1<>0)/(COUNTIF(A1:<col>1,A1:<col>1)))"
Dim msgDetails As String
Dim maxUniques As Long
Dim maxRow As Long
Dim i As Long

With Application
.Calculation = xlCalculationAutomatic
.ScreenUpdating = True
End With

Seq.Range("A1").Offset(, numRows + 1).Resize(numRows ^ numRows).Formula = Replace(FORMULA_UNIQUE, "<col>", Chr(numRows + 64))
maxUniques = Application.Max(Seq.Columns("A").Offset(, numRows + 1))
maxRow = Application.Match(maxUniques, Seq.Columns("A").Offset(, numRows + 1), 0)
msgDetails = "The best match gives " & maxUniques & " matches" & vbNewLine & vbNewLine & _
"The best match series is:" & vbNewLine & _
Join(Application.Transpose(Application.Transpose(Seq.Columns("A").Cells(maxRow).Resize(, numRows))), "-") & vbNewLine & vbNewLine

With Seq.Columns("A").Cells(maxRow)

For i = 1 To 7

If .Cells(1, i).Value <> 0 Then

rng.Cells(idx, i).Value = .Cells(1, i).Value
End If
Next
End With

rng.Cells(idx, 8).Value = maxUniques

Debug.Print runVersion & vbNewLine & msgDetails
End Function

sans.s
05-14-2012, 09:02 AM
That's super cool xld! Thank you! :)

I ran the code but at the end of the calculation only the timer windows appears but not the results.

I was wondering if it would also be possible to be able to set up the ranges for the code as in Sheet 3 in the attachment below. I've made a very quick example. If it is possible, this is probably the best method due to the thousands of ranges in Range to Check. If it is, it would be most amazing and the biggest time saver as I could set up a huge document and leave it process for as long as it requires.

Thank you so very much

sans.s
05-16-2012, 05:50 AM
Hi xld,
I have been processing the data for the past 2 days and have managed to get the results for 100 out of the approximately 5000 thousand rows. If you do manage to modify the algo so I can select multiple ranges in Range to Check please let me know as currently I will need to be processing a range at a time manually which will take me a very long time. It doesn't need to be like I suggested above, any other method will do. Thank you very very much for your help.