PDA

View Full Version : How to randomly choose sample rows after using Autofilter??



rangudu_2008
11-05-2009, 11:01 AM
Hi,

Can anyone explain clearly (using layman terms) about how to go about randomly choosing specific number of rows from a set of autofiltered records?

For e.g., If there are 1000 rows of data in a worksheet and after applying autofilter to certain column(s) [one or more] say i get some 75 rows. Then how can x no. of rows be chosen where x can be input by the user or calculated based on a certain percentage? Say if out of these 75 filtered rows, 8 rows have to be chosen randomly - where 8 can be input by the user or also be calculated as 10% of 75.

Also, is it possible to choose these x records from the filtered set of rows without actually copying them to another sheet? How can that be done?

Ranga

mdmackillop
11-05-2009, 05:01 PM
This uses a helper column of random numbers


Sub Filters()
Columns("A:J").AutoFilter
Columns("A:J").AutoFilter field:=1, Criteria1:="=" & InputBox("A number from 1 to 10", "Initial filter")
Columns("A:J").AutoFilter field:=10, Criteria1:="<" & InputBox("Percentage", "Random Filter") / 100
End Sub

rangudu_2008
11-05-2009, 07:29 PM
Hi mackillop,

I've got a workbook where there are 48 columns. I apply autofilter to 4 different columns before actually choosing sample rows. There are 4 columns which i need to filter upon before randomly choosing rows to be copied to another sheet. They are namely, NAME (AQ), RULE_CODE (R), KYC (AO) and RSKWT (S) in that order.

For any particular USER and any specific RULE_CODE,

If KYC = 1 and 2 && if RskWt is > 0 but <= 6, then 6% of filtered no. of rows be chosen randomly

If KYC = 3 && if RskWt > 0 but <= 6, then 8% of filtered no. of rows be chosen randomly

Else If KYC = 1 or 2 or 3 && if RskWt is >= 7, then 10% of filtered no. of rows need to be chosen randomly

I may filter on 10 or 8 or 6 for any user / rule_code combination, but how can i randomly choose and copy certain rows from the filtered ones to another sheet? How can that be done?

I'd developed 3 modules, one called Allocate, another called Filter_N_Transfer and the other called Random Sampler.

I've coded the above set of If conditions in VBA (in the Allocate module) as follows:
=IF(AND(RC" & RskWt & "<=6,RC" & KYC & "=3),8,IF(AND(RC" & RskWt & ">6,OR(RC" & KYC & "=1,RC" & KYC & "=2,RC" & KYC & "=3)),10,6))"

Any suggestions?

Ranga

mdmackillop
11-06-2009, 02:01 AM
Try this. Column R has been change to Integer date values, otherwise all data is unique. You'll need to add your formual to insert the test percentage into Textbox1.

rangudu_2008
11-06-2009, 10:11 AM
Hi mdm,

The code partially does what i need to do, but can you explain me in detail about how it works?

Also, could the filtered rows be copied (and consolidated) to another sheet after the options are set and the sample button in the userform is clicked?

Ranga

mdmackillop
11-06-2009, 04:26 PM
As requested, the code simply fiters your data successively. The function returns unique available values to populate each listbox.

Regarding copying, record a macro copying and pasting the data and call it from the button code.

rangudu_2008
11-06-2009, 10:00 PM
Hi mdm,

Thanks for your coding, i'd been racking my brains to find a solution, which i would like to fine tune and make it work more adeptly.

I went through the code and now i understand how it returns unique available values to populate each listbox. Instead, isn't it easier to determine the filtered row numbers and store them in sort of an defined array / structure? If it can be done this way, then some sort of flags (values or alphabets or some of symbol or conditional formatting or something) can be set-up in a separate column against each row which can be randomly chosen when filtering is done successively. Probably, these flags can be cleared out every time and reset and i also should take care of the overriding conditions as defined in the document that i posted earlier.

I'm also exploring whether it can be done this way since i want the code to choose rows randomly and that needs to be done recursively for all users automatically. This way i can not only have a control of the data being copied to many sheets from the master sheet (and avoid duplication) but also can consolidate the samples chosen for various users to a single sheet easily once all rows to be taken as samples have been identified which can be copied to a separate sheet after removing the filter at last.

As i'm looking at an end-to-end automation of the process, i've been thinking and came up with this logic.

Can this logic be implemented thru code? Can you / anybody help me??

Ranga

mdmackillop
11-07-2009, 03:10 AM
The logic of my code is
1. User chooses a value to filter on
2. From the results available; Userchooses a value to filter on
3. Repeat step 2 as required
4. Subject to choices, a random set of the final results is obtained.

I suggest you add the required code to complete step 4, as a first stage, then we can look at it again.

If you can express what you are after in a series of simple, logical steps, we can see exactly what you are after, and probably implement it.

rangudu_2008
11-09-2009, 07:45 PM
Hi,

I already have a basic random number generator code (which i'd posted in one of the earlier posts to this thread), but the thing i wonder is how can any number of row(s) be copied from a filtered sheet of records directly? I mean, is it possible to atleast copy one row with filters applied to (without removing the filter from) the sheet?

After spending much time browsing, I finally came to know that by referencing the autofilter's visible range by using the AutoFilter.Range.SpecialCells(xlCellTypeVisible) property, i can pick out rows from a filtered range of data from a worksheet.

My basic random number generator code can generate random numbers on a sheet (in a separate column) and can copy and transfer the required number of rows (input by the user) based on random selection to a separate sheet. This code works fine without filters applied, but i need to make it work when filters exist in the sheet.

My code is as below:

' Extracts requested records from a database by random selection.
' The list must be properly organized, without blank rows or columns, but may or ' may not have one header row.
' 1. Open and activate the file with records from which random data has to be chosen

' 2. Click Extract Samples button on the Extract menu
Option Explicit
Sub ExtractRandom()
Dim NumToExtract As Long
Dim NumRows As Long
Dim msg, style, title, response
Dim DataSheet As Worksheet
Dim wsExtract As Worksheet
msg = "This macro will extract the requested number of records" _
& vbCrLf & "from a list by random selection. It assumes there are" _
& vbCrLf & "no blank rows or columns in your list, and the list" _
& vbCrLf & "starts in cell A1. If this is NOT the case, click CANCEL" _
& vbCrLf & "and fix things."
style = vbOKCancel
title = "Are you ready?"
response = MsgBox(msg, style, title)
If response = vbCancel Then Exit Sub
Set DataSheet = ActiveSheet
msg = "Does your list have a header row (Column headings)?"
style = vbYesNo
title = "Headers?"
response = MsgBox(msg, style, title)
NumRows = DataSheet.Range("A1").CurrentRegion.Rows.Count
NumToExtract = InputBox("How many records do you want to extract?")
'Mark records with current row number for resorting at end of macro
With DataSheet
.Columns("A:A").Insert Shift:=xlToRight
.Range("A1:A" & NumRows).FormulaR1C1 = "=ROW()-1"
.Range("A1:A" & NumRows).Value _
= .Range("A1:A" & NumRows).Value
End With
'Create random number for each record & sort by random number
If response = vbYes Then
With DataSheet
.Columns("A:A").Insert Shift:=xlToRight
.Range("A1").Value = "Rand"
.Range("A2:A" & NumRows).FormulaR1C1 = "=RAND()"
End With
Application.Calculation = xlCalculationManual
With DataSheet
.Range("A1").Sort Key1:=.Range("A2"), _
Order1:=xlAscending, Header:=xlYes, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
.Range("A2").Value = "1"
.Range("A2").AutoFill _
Destination:=.Range("A2:A" & NumRows), Type:=xlFillSeries
End With
Else
With DataSheet
.Columns("A:A").Insert Shift:=xlToRight
.Range("A1:A" & NumRows).FormulaR1C1 = "=RAND()"
End With
Application.Calculation = xlCalculationManual
With DataSheet
.Range("A1").Sort Key1:=.Range("A1"), _
Order1:=xlAscending, Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
.Range("A1").Value = "1"
.Range("A1").AutoFill _
Destination:=.Range("A1:A" & NumRows), Type:=xlFillSeries
End With
End If
'Advanced filter extracts records numbered to number requested
Set wsExtract = Sheets.Add
wsExtract.Range("B2").FormulaR1C1 = "=RC[-1]<=" & NumToExtract
' This line of code below extracts sample records from the source sheet to a new sheet
DataSheet.Range("A1").CurrentRegion.AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=wsExtract.Range("B1:B2"), _
CopyToRange:=wsExtract.Range("A4"), Unique:=False
With wsExtract
.Rows("1:3").Delete Shift:=xlUp
.Columns("A:B").Delete
.Name = "Extracted_" & NumToExtract & "_RandomRecords"
End With
'Restore records to original order
DataSheet.Range("B1").Sort Key1:=DataSheet.Range("B1"), _
Order1:=xlAscending, Header:=xlNo, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
DataSheet.Columns("A:B").Delete
Application.Calculation = xlCalculationAutomatic
End Sub


I came across an article in one of the many other online forums on Excel for generating non-repeating random numbers.

It had a discussion about 2 basic methods for creating non-recurring random numbers.

Generate a random number. Check it against a list of numbers already generated. If it exists, get another one. If it does not, add it to the list.
Generate a list of all possible numbers, in sequence. Use the random number generator to mix them up.Both methods have their advantages and disadvantages and the method used depends upon the situation on hand. For example, If you are shuffling a deck of cards, you would use the second method. Otherwise, as the number of available cards diminishes, you spend most of your time generating invalid entries. Conversely, if you were generating positions of objects in a large universe, you would use the first method since you aren't going to generate all possible values, just a few.

It also discussed on a piece of code to demonstrate the above two methods:

Private Function CreateRandom(ByVal num As Long, ByVal min As Long, ByVal max As Long) As Variant
'Returns an array of numbers between where min <= x < max
'Returns NULL if there is a problem
Dim aValues() As Long
Dim i As Long, x As Long, span As Long, j As Long, bFound As Boolean
span = max - min + 1
If (span > num) And (num > 0) Then
ReDim aValues(num - 1)
For i = 0 To num - 1
Do
x = Int(Rnd() * span) + min
bFound = False
For j = 0 To i - 1
If x = aValues(j) Then bFound = True: Exit For
Next j
Loop While bFound
aValues(i) = x
Next i
CreateRandom = aValues
Else
CreateRandom = Null
End If
End Function

Private Function ShuffleArray(aData As Variant) As Boolean
'shuffles array adata. Returns true unless there is a problem
Dim max As Long
Dim i As Long, j As Long, v As Variant

If IsArray(aData) Then
max = UBound(aData)
For i = max To 1 Step -1
j = Int(Rnd() * (i + 1)) 'pick one to switch
If i <> j Then
v = aData(i)
aData(i) = aData(j)
aData(j) = v
End If
Next i
ShuffleArray = True
End If
End Function


I need to tweak my code (merge the above codes together) and make it work when there are filters applied to the workbook (filter maybe applied to one or more columns) and i want to set these filter automatically.

This is most trickiest part in my code, for which i'm trying to find a solution which is still elusive. :think: :mkay
Can this be done? Can anybody teach / help me how to go about coding for this?

I'll be extremely grateful for any help that i can get.:bow: :bow:

Ranga

mdmackillop
11-10-2009, 06:41 AM
If you can express what you are after in a series of simple, logical steps, we can see exactly what you are after, and probably implement it.

Can you assist with this request?
I don't see the problem with a simple =Rand() approach

rangudu_2008
11-10-2009, 09:42 AM
Hi MD,

Please read through..

There can be "x" no. of users for whom sample rows of processed data (generated for about 55 different rule codes) have to be chosen randomly (from 2 different types of worksheets containing data). The sampling is done every fortnight and is based on certain percentages and conditions. They are:
All users with less than 9 months vintage and;
All new rules for first three months.


Note: Where new rules are activated and where users with vintage of < 9 months are present during a particular period, for each category separate samples needs to be taken.



There are 4 columns in the workbook which are filtered manually NAME, RULE CODE, RSKWT and KYC. The sampling process is carried out based on certain percentages as follows:
For any particular USER and any specific RULE_CODE (as applicable from the above conditions),
If KYC = 1 and 2 && if RskWt is > 0 but <= 6, then 6% of filtered no. of rows be chosen randomly
If KYC = 3 && if RskWt > 0 but <= 6, then 8% of filtered no. of rows be chosen randomly
Else If KYC = 1 or 2 or 3 && if RskWt is >= 7, then 10% of filtered no. of rows need to be chosen randomly
In certain circumstances, the above conditions may need to be over-ridden as described below:
If few (or all) of the users who are sampled are beyond 9 month vintage, then any no. of records can be chosen as samples from his / her processed data as required.
While choosing samples randomly, care should be taken care not to choose samples for the same entity processed by different users. In other words, samples for the same entity processed by different users should not overlap in the sense that too many samples for the same entity for any user need not be chosen.
If a new user processes records using another users’ login in the system, then sampling has to be done for the new user as per the process defined initially – records processed by the new user (which are tracked separately) must be vlookup-ed and the NAME column should be changed appropriately before samples are chosen.
I need to take care of the sampling (to be done in a random way) as per the above steps. I've defined the background more clearly and the progress i've made so far to implement the above in one of my earlier posts. I've attached a word doc containing all the information which also contains the modules.

Just let me know if any more information is required.

Ranga

mdmackillop
11-10-2009, 10:13 AM
The interpretation and implementation of that goes beyond the free help I'm prepared to offer. If there are specific issues we can look at these.

rangudu_2008
11-10-2009, 08:06 PM
Hi MD,

I've developed the required automation in parts as different modules and need some help in fixing one of the modules which need to integrated to the main module later.

I need to pick-off data from specific rows and and consolidate it in a separate sheet in the same workbook and build up sort of a "results table" which contains rows extracted from the main sheet.

I need to code for a new function or module (as explained in the post made earlier with the codings y'day) which can pick and return x no. of random rows while the sheet is in filter mode (i.e. When filter(s) exist in the workbook which can be one or more)

I'm pretty new to coding with autofilter in VBA, and i need to implement this randomized methodology (i've coded a basic module, which needs tweaking), the combination which makes the task more difficult for me. I need to fix up my random sampling module and can then take care of the filtering part where rows can be chosen. I need some help in putting together this function / module.

Can you help me out with the same?

Ranga

rangudu_2008
11-11-2009, 10:01 PM
Actually, there are 2 kinds of data which i call "Alert Extract" and the other "Case Extract". The Alert Extract has about 48 columns and the Case Extract has about 87 columns. From both these extract, for sampling, only 31 columns are required.

Ranga

rangudu_2008
11-14-2009, 12:40 AM
Hi Malcolm,

Let me make it more clear once again on my requirement.

I'm coding this macro as 3 parts - the randomizing part, the autofilter part and sampling part. The randomizing part should shuffles up the rows, the autofilter part should extract rows (that are required based on the conditions) and the sampling part will collate the filtered rows chosen in a random way.

I've been really racking my brains to code the autofilter part, but i need to cross the randomizing hurdle first, which seems to be the most difficult part to get right. I've developed a basic module for randomizing (the code which i posted below)

I'm trying to accomplish my requirement by making the random sampling module and the autofiltering both working hand-in-hand in some way. The solution is still elusive. I suppose my logic and approach are right, if you have any suggestions, please let me know.

I'm need to fix up my randomizing logic which doesn't work when there is a filter in place. I'm a novice on using VBA for autofiltering. So, any help would be very much grateful.. :friends: :bow:

I came across some of the stuff on autofiltering in XtremeVBTalk and in VBAX, but couldn't find anything that would suit my requirement.

Ranga

mdmackillop
11-14-2009, 05:22 AM
Your inital request.


If there are 1000 rows of data in a worksheet and after applying autofilter to certain column(s) [one or more] say i get some 75 rows. Then how can x no. of rows be chosen where x can be input by the user or calculated based on a certain percentage? Say if out of these 75 filtered rows, 8 rows have to be chosen randomly - where 8 can be input by the user or also be calculated as 10% of 75.

Why shuffle the rows. It seems to make no sense.

I have assumed you will choose the autofilter criteria. Is that correct, or are you looking to to carry out autofilters on 3 random criteria? If so I can add that in.

By applying a helper column of random numbers, you can return a random sampling by taking that fraction where the random number is less than your sample percentage. I cannot see that generating random numbers and selecting from these is any more random.

rangudu_2008
11-14-2009, 06:04 AM
Hi Malcolm,

As mentioned before, the autofilter criteria is determined by 4 columns:
NAME (AQ)
RULE_CODE (R)
KYC (AO) and
RSKWT (S)

I need to pick sample rows randomly for each auto-filtered NAME (the exact number of samples is determined by the percentages mentioned below) and for specific rule codes (there are 57 different codes and samples need to be chosen only for certain codes, not for all) based on the following criteria:

If KYC = 1 and 2 && if RskWt is > 0 but <= 6, then 6% of filtered no. of rows be chosen randomly
If KYC = 3 && if RskWt > 0 but <= 6, then 8% of filtered no. of rows be chosen randomly
Else If KYC = 1 or 2 or 3 && if RskWt is >= 7, then 10% of filtered no. of rows need to be chosen randomly

The rule codes for which sample rows need to be chosen will be mentioned in some way initially before the macro starts execution. I need to design sort of an interface where the options for sampling can be preset in some way (by the end-user).

(Can these criteria can be set automatically thru code? I'm curious to know..)



I don't clearly / fully understand what you mean by:
By applying a helper column of random numbers, you can return a random sampling by taking that fraction where the random number is less than your sample percentage. I cannot see that generating random numbers and selecting from these is any more random.


Ranga

rangudu_2008
11-14-2009, 06:46 AM
In simple words, for any user, If there are 'x' rows filtered out of which i 'y' rows have to be chosen, then i'll choose this 'y' rows one by one by shuffling the 'x' rows filtered.

Aussiebear
11-14-2009, 02:25 PM
Hi Ranga, I'm inclined to agree with MD, that the level of assistance being asked for is beyond the normal scope (level of free assistance) of the forum. Please consider entering into a pay for assistance agreement if you wish this matter to go forward further.

mdmackillop
11-14-2009, 04:36 PM
Change my Post #4 code as follows


Private Sub ListBox4_Click()
Range("A:AX").AutoFilter field:=19, Criteria1:="=" & Format(ListBox4, "0.00")
TextBox1 = GetPC
End Sub

'Add this function
Function GetPC()
Dim RskWt As Long, KYC As Long
'If KYC = 1 and 2 && if RskWt is > 0 but <= 6, then 6% of filtered no. of rows be chosen randomly
'If KYC = 3 && if RskWt > 0 but <= 6, then 8% of filtered no. of rows be chosen randomly
'Else If KYC = 1 or 2 or 3 && if RskWt is >= 7, then 10% of filtered no. of rows need to be chosen randomly
KYC = ListBox3
RskWt = ListBox4

If RskWt <= 6 Then
If KYC < 3 Then
GetPC = 6
Else
GetPC = 8
End If
Else
GetPC = 10
End If
End Function