PDA

View Full Version : Sleeper: Pull different numbers from independent lists efficiently



Adaptage
04-29-2023, 11:39 AM
Hello everybody. lately I've been looking for a volatile UDF that pulls a number from each of several lists so that all pulled numbers are different. I managed to find some things that work, but they're either slow or they're a macro. The UDF's input would be the range containing all lists (if the lists were A:[1,3,4], B:[5,7], C:[2,4,7], D:[5], E1:[3,4,5] the UDF input's range would be A1:E3) and the output would be an array containing numbers from a random solution (for example, [1,7,2,5,3] or [4,7,2,5,3] using the previous lists). The UDF needs to take an optimised path, otherwise with many long lists it's going to be very slow. Here is a sample to better understand the problem:
30773

p45cal
04-30-2023, 09:13 AM
but they're either slow or they're a macro
Are you saying the UDF is slow?
With the (small) sample in your attachment it runs pretty quickly. So am I to assume that there are lots of these lists and/or these lists are significantly bigger?

One of the things that can slow a udf/macro is reading and/or writing from/to the sheet multiple times. There's only one place in your code that does this multiple reading, it's:

For i = 1 To cl
inp_col(i) = WorksheetFunction.Count(inp.Cells(1, 1).Resize(rw).Offset(0, i - 1))
Next i
which, by the way, could be written:
For i = 1 To cl
inp_col(i) = WorksheetFunction.Count(inp.Columns(i))
Next i

but since you already have the values in memory in inp_arr, you could use those:

For i = 1 To cl
For k = 1 To rw
If Not IsEmpty(inp_arr(k, i)) And IsNumeric(inp_arr(k, i)) Then inp_col(i) = inp_col(i) + 1
Next k
Next iWhile it loops and asks more questions, experience tells me this will still be faster; over to you to test!

I haven't yet looked at the rest of your code for other things that might speed it up, recursion is hard work for me! Tell me first if my suggested change makes significant difference to speed.

Adaptage
04-30-2023, 09:26 AM
am I to assume that there are lots of these lists and/or these lists are significantly bigger?
The dataset is significantly bigger than the one i gave, the sample is only for clarity purpouse. I know that the code can be improved to make it faster but i was wondering if there was a way to take a near-optimal route every time. A few things about the UDF i forgot to mention:
It is intentionally volatile, it needs to update whenever i refresh the worksheet, whether or not the input numbers change.
The same numbers can appear in multiple columns, but no more than once in a single column

Thank you for helping

p45cal
05-01-2023, 11:22 AM
Tell me first if my suggested change makes significant difference to speed.
I wouldn't mind knowing if my suggestion made any significant difference. Not easy for me to test your real life setup.

In the attached, a different approach.
It goes through all the cells in the target range and:
1. Creates a count of all distinct values
2. Creates a list (a dictionary) of unique values in each column of the target range.

Then it looks at the shortest list (randomized if there are more than one shortest lists), and takes one of the values from it, keeps it there and removes that value from all the other lists (columns).
It keeps doing that until there's no further change, at which point we should have columns with mutually unique values, so now we take a random value from each column.
There is only one read and one write to the sheet per formula.
I haven't fully tested or debugged and I haven't handled an un-gettable solution.
I've not made it volatile; it may not need it.

I have no idea if it's fast enough or not; difficult to test. It's certainly a convoluted solution. I've used dictionaries because in my experience they're surprisingly fast and it's easy to add/remove items without having to find them first.
I feel sure there's a more elegant and efficient way of doing this and I may come up with one - who knows?

georgiboy
05-02-2023, 01:54 AM
Used a different approach, may depend on your Excel version - it was made in Excel 365 v2304.

I created the below formula:

=UNIQUE(LET(rng,A5:E7,csq,SEQUENCE(,COLUMNS(rng)),BYCOL(rng,LAMBDA(x,INDEX( FILTER(x,x<>""),RANDBETWEEN(1,COUNTA(FILTER(x,x<>""))))))),TRUE)

And thought to myself, right i have these numbers but not always the correct amount so why not loop the formula until we have the correct amount of results. I used the EVALUATE function to evaluate the above formula and ask what the UBOUND of the array it creates is, if the UBOUND of the created array is equal to the amount of columns in the range then we have the result.

Below is the function created to get the results:

Function GetUniqueRandom(rng As Range) As Variant
Dim var As Variant, x As Long

For x = 1 To 1000
var = Evaluate("UNIQUE(LET(rng," & rng.Address & ",csq,SEQUENCE(,COLUMNS(rng)),BYCOL(rng,LAMBDA(x,INDEX(FILTER(x,x<>""""),RANDBETWEEN(1,COUNTA(FILTER(x,x<>""""))))))),TRUE)")
If (UBound(var)) = rng.Columns.Count Then Exit For
Next x

If x < 1000 Then
GetUniqueRandom = var
Else
GetUniqueRandom = "Can't get result"
End If
End Function

Simply used in a cell as:

=GetUniqueRandom(A5:E7)