PDA

View Full Version : Choosing Random Samples



rangudu_2008
09-14-2008, 02:52 AM
Dear vbax Gurus,

I had coded a macro last month which randomly chooses certain no. of records from an excel sheet (named Records) and exports the same to another new sheet (created at runtime) in the same workbook. This code needs the input of no. of records to be chosen randomly.

I'm trying to modify the code in such a way that the no. of records to be chosen (from the sheet named Records) is automatically taken from a table placed in another sheet (named Summary) in the same workbook. This table contains formulas and is automatically updated based on the information available in Records worksheet.

To choose samples, records wud be pasted to the sheet and the table in the Summary sheet wud be updated appropriately before the Extract Samples button is clicked.

The no. of samples to be chosen for each user (whose names are in the table) is based on 3 columns in the Records worksheet.
They are namely,
a. Record Status (col E)
b. Rule (col F) and
c. Code (col G) - takes one of the three values 1, 2 or 3

The randomly chosen samples (for each user) should be a right mix based on Record Status (col E) and Rules (col F) for a particular Code (col G) as specified in the table in the Summary sheet.

There are actually 19 different rules being used. Samples have to chosen covering all the rules. Certain rules are more important for which more no. of samples have to be chosen than other rules.

To choose samples covering all the Rules, can a provision be made in another table in Summary Sheet so that the no. of samples to be chosen for that particular Rule [as it is also based on the value of Code (col G)] be specified alongside the Rule? How can this logic be amended in my code? Does anybody have an easier and better logic to implement this?

The list of Rules to be covered are as follows:
HRIWL
WSHCM
LRACC
LFDWC
LRGCT
NWCUS
PRGRP
HKTAL
HRCJF
HRMCJ
DMOSB
DSOMB
DOCOB
DBCBB
BROMS
RNDTR
XPVRN
ORSTR
CBSTR

Can anyone help me with the modifications that i'm trying to implement in my code?

I've been thinking of ways to implementing this.. :think:
So far i've not found a way out :help

Ranga

mikerickson
09-14-2008, 06:13 AM
Let me restate the problem to see if I understand what you want.
From the Summary sheet, I see that,
for Arnold you want:
0 rows from the Records sheet that have Code=1,
2 rows from Records where Code=2 and
2 rows where Code=3

for Harry:
2 rows where Code=1
19 where Code=2
16 Code=3.

You also want the Rules to be considered when choosing the sample rows (for one Name).
Do you want
1) each Rule to be represented at least once in the total sample
2) each Rule to be represented in the samples for each Code.
(meeting either of these will require increasing the number of samples)

You also mentioned that you want the Rules in the sample to have meet some proportionality.

Do you want the proportionality of the Rules in the sample (viz. total samples or per Code?) to match:

A) the proportion of that Rule found in the data for the Name, or

B) some predetermined ratios. If B), I don't see where those ratios are. The file Rules.xls just lists the rules without giving any target proportions.

rangudu_2008
09-14-2008, 11:03 AM
I've updated the Summary sheet which includes another table that i'm using already.

The proportionality of the Rules based on which samples have to be chosen is based on the RWT value specified for each Record ID.

Further information from my side that may be useful is attached.
The important rules for which samples have to be chosen are highlighted in bold in the document.

Ranga

mikerickson
09-14-2008, 12:18 PM
According to the chart shown, you have 17 rules that should each return 10% of the samples, plus 2 rules that each account for 8% of the samples.

That's 186%. I don't know how to return a sample where 186% of the sample matches different, exclusive, criteria.

It may be that the Rule structure that you are proposing is not the best way to achieve your ultimate goal. If it is, the Rules need to be validated so that the result they call for is possible.

rangudu_2008
09-14-2008, 06:58 PM
The rule structure that i proposed is prefixed and is useful for parts of various processes in a big system.

I do not fully understand what it is meant by choosing samples where 186% of the sample matches different, exclusive, criteria.

I look at it this way:
The rule conditions are that

If RWT > 0 but RWT <= 6 for a particular rule (or a combination of rules) then
Choose 8% of the total records as samples where code value (=3) and 6% of the total records as samples where code value (=1 or 2)

If RWT >= 7 for a particular rule (or a combination of rules) then
Choose 10% of the total records as samples where code value (=1 or 2 or 3)

The grouping of rules + code value form a complexity here. In other words, sampling rate differs when code value changes.
Based on this, sample records are chosen for 17 different rules which may form combinations resulting in increase of RWT value.

Out of the 17 rules, 1 rule has a RWT value of 5, 1 rule with RWT=6, 5 rules with RWT value of 7, 2 rules with RWT=10 and the remaining 8 rules with RWT=8.

I'm trying to formulate a logic for choosing samples, but i'm not fully finished yet.

Ranga

rangudu_2008
09-15-2008, 11:53 AM
I'm trying to formulate a logic to devise an efficient, effective, dynamic and robust random sampling algorithm.

The 17 rules are classified under 2 categories - Daily & Monthly.
Monthly rules (with higher RWT) are give high priority than daily rules while choosing samples.

Declarations: (Variables, 2D/3D arrays to store)
code values, names, total samples to be chosen, sample size of each user, etc.

The basic idea is to formulate a workflow for choosing samples randomly.
It can be implemented with 2 or 3 for/while/do...while loops and Select.. Case statements appropriately.

The outermost loop is used to cycle through the names of all the users one by one while choosing samples.

Algorithm:
1. Define conditions appropriately to ascertain the sample rate (6%, 8% or 10%) using RWT value of the rules.

Conditions: (Can be determined with If...Else)
a. If RWT > 0 but RWT <= 6 for a particular rule name (or a combination of rules) then
Choose 8% of the total records as samples where code value (=3) and 6% of the total records as samples where code value (=1 or 2)
b. If RWT >= 7 for a particular rule (or a combination of rules) then
Choose 10% of the total records as samples where code value (=1 or 2 or 3)

2. Once sample rate is obtained, take the code value for which the highest no. of samples have to be chosen (for a user). This no. will never be greater than the total no. of samples for that user for any code value.

3. Now, determine how many samples need to be taken for each code value and store it in an array or different variables.

4. Now using the rand function, randomly choose a RWT value randomly from the total no. of samples and then choose a record id. This may pertain to any one of the rules. This record id is stored in an array.

5. After obtaining the record id, retrieve that entire row and export it to the newly created sheet. This process is repeated till all the samples for the chosen code value are exported. Now store the count of remaining record ids that are not chosen (nothing but the no. of samples that remain from the total collection).

6. Take the next highest code value and exhaust it in the same way as described in steps 4 & 5. The last remaining code value will have the minimum no. of samples to be chosen for any user.

Repeat steps 1 to 6 until samples are obtained for all users.

Is there any better algorithm or ways to choose samples randomly?

Suggestions are welcome.

P.S.: One of the drawbacks that i find in my algorithm is that higher the no. of samples, the loop iteration increases.

Ranga

mikerickson
09-15-2008, 02:50 PM
The logic that I'm delveloping is
1)Create 1-D array of records (I forget the name of your unique key field at the moment) from which the random selection is to be made. (Filter the data base)
2)Randomly re-order the array (For i = 1 to NumSelection: Swap Array(i) with Array(randomIndex) :Next i)
3)Redim Preserve the array (1 to NumSelection).
4)Expand that to build the 2-D array of all fields for those records.

rangudu_2008
09-15-2008, 07:07 PM
The logic looks simple but we should also look into the efficiency and robustness of the algorithm. Kindly send back any suggestions to my proposed algorithm.

Ranga

mikerickson
09-15-2008, 08:16 PM
I thought that you to sample a percentage of each name (9 names) * each kind of Record (Batch,Existing,False,New) * each of the 19 rules * 3 codes.
That's 9*4*19*3 filters, each followed by a random selection.
The "outer loop" runs 2052 times.

Is that correct?

If so, an Advanced Filter in place, followed by a random selection (percentages in a table) of visible rows. dataRange.SpecialCells(xlCellTypeVisible).Rows is a collection which can be easily turned into an array of ranges. Random selections can be made from this array of ranges.

mikerickson
09-16-2008, 06:33 AM
With further testing, AdvancedFilter is way too slow.
The next approach will be to create 2052 arrays for each Name/Record/Rule/Code options. Loop through the range, putting row numbers in the appropriate array and then randomly sample(reduce) each of those arrays to get the overall sample.

rangudu_2008
09-16-2008, 07:57 AM
Instead of creating 2052 arrays for each Name/Record/Rule/Code options, can't we create two 2D arrays & use it as the two dimensions of another 2D array?
I know this way is far too advanced (curse my stupid brain!!!:motz2:)

Do u have any other ideas? Can u suggest me any other options?

Can u also send any kind of pictorial representation which would help me understand what u are trying to explain? It wud be of great help to me.
My head goes bonkers :banghead: when i visualise what u said.

Ranga

mikerickson
09-16-2008, 08:49 PM
I'm looking at
Dim ArrayofArrays(1 to 2052)
All row numbers with "Arnold, Batch Record, BROMS, 1" will be in ArrayOfArrays(1)
All row numbers with "Betty, Batch Record, BROMS, 1" will be in ArrayOfArrays(2)
...
All row numbers with "Arnold, Existing Record, BROMS, 1" will be in ArrayOfArrays(10)
...
All row numbers with "Smith, New Record, XPVRN, 3" will be in ArrayOfArrays(2052)

But, in general, Ubound(ArrayOfArrays(i)) <> Ubound(ArrayOfArrays(j).
ArrayofArrays(i) will be dimensioned as needed.
ReDim ArrayOfArrays(1)(1 to EVALUATE("SUMPRODUCT((A2:A1868="Arnold")(E2:E1868="Batch Record")(F2:F1868="BROMS")(G2:G1868="1"))")

This will produce a 2D "jagged array". (By strict definition, VBA doesn't support jagged arrays, but this ArrayOfArrays is one way for VBA to emulate them.)

No matter what method is used, each batch of rows from which a selection is to be made needs to be put together. There are 2052 batches and 2052 minimaly sized arrays seems the most efficiant (in terms of memory) way to gather them.

The Rule/Code > sample percentage is put in a 19X3 array of cells, holding VLOOKUP calculations from the RULE/RTW chart you presented in the Word document.

"Given an array, randomly select N elements (without duplication)" is a fairly simple problem.

The part of your project that I'm finding challangeing is to efficiantly create the arrays from which to select elements.

rangudu_2008
09-17-2008, 10:30 AM
The data i attached is only a sample and it is being generated from a monitoring system.

If the output from this system is high (say if there is a volume surge during a particular month) & with the sampling being done every fortnight, the total no. of records & no. of users differing how can the array size be determined?

I suppose the size of the array can be determined only at runtime.

Can't we split the storage across different arrays (atleast temporarily) & use objects of those arrays as part of another array created at runtime?

I'm currently thinking a way to find answer to this question, which can be easily solved in Java or C/C++, but i too ponder how it can done in VB :think: and sometimes i feel :bug: and :banghead: as i'm doing this work alone.

P.S.: At the max. the no. of users can be 20.

Ranga

rangudu_2008
09-17-2008, 05:26 PM
Can't we define any kind of data structures to define the variables that will be used for random sampling inside class modules? The objects of these data structures can be created & used dynamically at runtime. If this way is possible, then i think the array size can be easily manipulated.

I know that i'm proposing some sort of object-oriented programming like Java/C++ here. Any ideas/suggestions from ur side about implementing this methodology?

P.S.: What's your real name, Mr. Mikerickson?

mikerickson
09-17-2008, 06:22 PM
How many records per run are you anticipating at both normal and peak usage?

It sounds like the Rule/RTW chart may change, if so building a data import system about one anticipated sampling rate is probably a mistake.

rangudu_2008
09-17-2008, 07:02 PM
I'm infact using a data import/export system (via an Access database) with some other VBA macro code in it and i'm looking to deploy this random sampling part as a module into it.

Of late, i'm getting some merging problems in my data import/export system which i'm trying to fix right now. I only find time to work on it at weekends as i'm a bit busy & i'm running tight on my time schedules.
I'm very conscious as to fix it up before this month-end as it has been quite a stretch for me.

I've just posted a separate thread for fixing up my data import/export system and make it more efficient and robust. Refer my thread titled "Merging Data from 2 Sheets".

Normal usage size = 6000 records
Peak usage size should be about 10000 records or less than twice the normal usage.

Ranga

mikerickson
09-18-2008, 07:05 AM
Here's what I've come up with.
The routine FilterAndSample is the main routine.
The input and output ranges are defined at the top and will need to be integrated with your situation.
The workbook has a dynamic range "ChartOfSamplePct" that reflects the word document you posted.

I hope this helps.

P.S. My real name is Mike Erickson.

rangudu_2008
09-20-2008, 10:19 AM
Mr. Eric,
I've attached some more information regarding the rules. Rules are classified as Monthly or Daily. Monthly rules have higher priority than Daily rules.

I'm not able to clearly understand your code where u implement the sample distribution pattern for choosing samples dynamically.

Since there is a possibility of the system generating a single record for multiple rules (more than 2 rules for a record) and that the users have the limited option of entering only one rule in the Rule column in their Record Process Tracker file (which is called the Productivity file of the user) the rule with the max. RWT value will be stored by the user. (i.e) the Rule would be of Complex type.

Let us consider that Arnold has processed 152 records for the fortnight, with the split of 5 alerts for code1, 88 for code2, 59 for code3. Then as per the table i mentioned, 10% samples have to be chosen for each code which means the sample split for code 1,2 & 3 are 0,9,6 respectively.

Giving preference to cover all the rules, if a particular user has no samples to be selected for a particular code value (as in the above example where no samples be chosen for code value 1) but he has processed < 10 records of any rule for code value 1, then atleast one record has to be picked for code1.

If the user has processed more records which were generated for a particular rule for any code) then the samples need to be distributed among all the 3 codes for that particular rule with priority as:
Code 3 (5%) > Code 2 (3%) > Code 1 (2%)

In the code that u posted, there were only fixed no. of records chosen for a user everytime. This needs to made dynamic, because if the no. of samples to be chosen for a particular user is varied in the summary sheet before every run of the function, then the no. of samples chosen should also differ.

Ranga

mikerickson
09-20-2008, 02:36 PM
The percentage chosen is fixed by the Rule/Code combinations in the data.
If the Rule/Code determines that a specific percentage of the records matching those conditions.
Which records are chosen is random, but the number of records selected is determined by the preset Rule/Code percentages multiplied by the number of records in the data base matching those conditions.

rangudu_2008
09-21-2008, 05:29 AM
The whole process i'm trying to automate is done as a quality check process of the records processed in the monitoring system. The common monitoring system is being used by various processing units (each unit have a different head count) and rules are activated periodically to generate records in the system.

Just a review of my requirement - choosing samples randomly from the processed records is dependent on the following factors:

User Vintage - Priority is high for new users i.e., More no. of samples will be chosen for those users with less than 6 months vintage (that's why sampling percentages 6%, 8%, 10% have been fixed)
New Rules - More records are also chosen for these new rules (which were activated in the system to generate records) for the first 3 months.
RWT value of the Rules
Code value of the RecordsThus, the no. of samples chosen for any new user is higher than those samples which are chosen for an user who is more used to the system and is determined not only by the RWT value of the rule, but also by the vintage of the rule (for the first 3 months) which is either of Simple or of Complex type.

To summarize, sampling (to be done randomly) in the above scenario is thus prioritized for: (a) New users and also influenced by (b) new rules whose vintage is taken into consideration for first 3 months since it started generating records.

Thus, the sampling percentage for the rules will be determined mainly by factors #1 & #2 and then by #3 & #4 (above).

What i had in mind (before formulating the algorithm) to pick up all information from the Summary Table - Values of:

Sample limit for each code value;
Total samples to be chosen for each user; and
Total sample size for all the users.When i tried to incorporate the code i already had with the sampling chart u designed, i get a runtime error which states:
"Method 'Range' of object '_Global' failed"

This error mainly pertains to the line in the FilterAndSample() module:

AllDSRules = Application.Transpose(Range("SampleChart").Columns(1).Value)

What can be done to rectify the problem?

Ranga

rangudu_2008
09-21-2008, 05:52 AM
I'm getting the Range error in the code is when i tried to incorporate your idea of creating the dynamic range called ChartOfSamplePct

I also do get type mismatch error in the indexFromStatus function, pertaining to the line below:

total = (Application.Match(aName, AllNames, 0) - 1)

Ranga

mikerickson
09-21-2008, 03:32 PM
What kind of error are you getting. The workbook I attached had that dynamic range. It should expand as more data is typed in.

I don't understand the mismatch error, unless there are some blank cells in the data.
This might be a more resiliant version of that function
Private Function indexFromStatus(aName As Variant, aRecordStatus As Variant, aRule As Variant, aCode As Variant) As Long
Dim total As Long
On Error GoTo HaltFunction
total = (Application.Match(aName, AllNames, 0) - 1)
total = total + (UBound(AllNames) - (1 - LBound(AllNames))) * (Application.Match(aRecordStatus, AllRecordStatus, 0) - 1)
total = total + (UBound(AllNames) - (1 - LBound(AllNames))) * (UBound(AllRecordStatus) - (1 - LBound(AllRecordStatus))) * (Application.Match(aRule, AllRules, 0) - 1)
total = total + (UBound(AllNames) - (1 - LBound(AllNames))) * (UBound(AllRecordStatus) - (1 - LBound(AllRecordStatus))) * (UBound(AllRules) - (1 - LBound(AllRules))) * (Application.Match(aCode, AllCodes, 0) - 1)
total = total + 1
If total > UBound(ArrayOfArrays) Then ReDim Preserve ArrayOfArrays(0 To total)
If IsEmpty(ArrayOfArrays(total)) Then Call initializeArray(ArrayOfArrays(total))
indexFromStatus = total
On Error GoTo 0
Exit Function
HaltFunction:
indexFromStatus = 0
initializeArray (ArrayOfArrays(0))
On Error GoTo 0
End Function

rangudu_2008
09-24-2008, 05:38 AM
Eric,
I'm in need of a complete explanation about how ur coding works (in algorithm form compared with the one i formulated) for more understanding inorder to integrate it with my code.

Currently, i'm fixing up some part of my existing code where i've been getting problems to populate the source data from which i've to choose samples.

There are 9 sheets in this workbook + the Summary sheet. This Summary sheet gets filled automatically (through excel formulas) when data gets populated from the 5th sheet (QA ALL RECORDS) till the 9th sheet (QA CASES LOG). The first 3 sheets need to be filled manually by the user and the processing starts from 4th sheet onwards.

The RECORD PRODUCTION sheet contains records with status of all types. I'll filter out those records on whom cases have been created & those records which are linked to existing cases. This is established by executing queries stored in the Access database at the click of the corresponding buttons.

The sampling part in this workbook is currently being done manually for records & cases separately in RECORD SAMPLER & CASE SAMPLER sheets. I'm trying to automate the same and want to implement the random sampling module with the chart code that u formulated.

I want to remove both the sampler sheets and choose samples directly from the QA ALL RECORDS SHEET & CASES LOG SHEET. I want to do it this way bcoz initially i had coded for executing queries to import/export samples in which i came across various complexities while coding.

Various complexities which affect this process are as below:
The amount of data taken into the database via the coding i have done - first the entire dump from which about 10-12% data is filtered out for obtaining the data to be taken up for sampling.
Increasing size of the workbook from the start till the process is complete (the empty template is about 4.1 MB before i start the process)
Duplication of data being done till samples are chosen.I can address these factors if samples are chosen directly from the 3rd & 5th sheet for which i'm trying implementing ur code.

I want the limits for the data that is taken up for choosing samples from the Summary sheet and samples have to be exported to the corresponding log sheets for records and cases respectively.

I tried replacing the sampler sheets with ur chart, but i'm not able to configure it to work with my code accordingly.

I need ur answers for some questions in ur code.

Q1. How & in which line do u create that dynamic range?
I tried to integrate ur Sample Chart code into my existing code, i tried redefining the line in FilterAndSample module below:

AllRules = Application.Transpose(Range("ChartOfSamplePct").Columns(1).Value)

Q2. You have also defined some main variables with appropriate comment which i've given below:

NamesCol = 1: RecordsCol = 5: RulesCol = 6: CodesCol = 7

I tried redefining the limits for masterDataRange to suit my requirements by changing values for these variables.

In both the above cases, i come across some runtime errors.

Q3. What's the role of the functions "SamplePctFromRuleCode", "rrayStr"?

Q4. I do not understand the functioning of the code module below fully:

outPutGoesHere.Parent.Cells.ClearContents
If masterDataRange.Row > 1 Then
Rem duplicate header row if one exists
If LCase(CStr(masterDataRange.Offset(-1, 0).Columns(NamesCol).Cells(1, 1).Value)) Like "*name*" Then
outPutGoesHere.Resize(1, masterDataRange.Columns.Count).Value = masterDataRange.Offset(-1, 0).Rows(1).Value
Set outPutGoesHere = outPutGoesHere.Offset(1, 0)
End If
End If
outPutGoesHere.Resize(UBound(outPutArray, 1), UBound(outPutArray, 2)).Value = outPutArray
With outPutGoesHere
.Parent.Activate
Application.GoTo reference:=.Address(, , xlR1C1)
End With

The only thing i understand just after having a look at above part of the code is that the randomly sampled data is sent to the ResultSheet as output.

Q5. What are the uses of the table (screenshot attached) in the RuleList sheet? How should this table be filled appropriately so as to match the Sample Size fields for each code for the user names included in the Summary sheet?

The 2nd part of the last question is more important for me as the samples should be chosen based on the following conditions:

For a new user, irrespective of the rule type, samples should be taken for 6 months - This is based on the join date of the user (vintage).
For all new rules (either of Simple or of Complex type) that are activated in the system to generate records, samples should be taken for first 3 months irrespective of the vintage of the Analyst.I've attached the QA template where i'm trying to include the chart. I also require some additions to be made in the sampling chart (those marked in Red) to validate & choose samples based on the above conditions. Let me know if more information is required from my end.

Ranga

mikerickson
09-24-2008, 05:57 AM
1) the dynamic name is not defined in VB there is a Name in the file
Name:ChartOfSamplePct
RefersTo:=OFFSET(RuleList!$A$4,0,0,COUNTA(RuleList!$A:$A)-COUNTA(RuleList!$A$1:$A$3),4)

2)Those are the column numbers of the 4 key bits from the data base, Name, Record Type, Rules and Code. (Relative to masterDataRange rather than the sheet.)

3)Given a Rule and a Code, SamplePctFromRuleCode returns the percentage from the range ChartOfSamplePct. It answers the question "What percentage of the data having this Rule and this Code is to be returned in the sample?")

The function rrayStr is not part of the project. Mac doesn't support Join and that is my emulator. I used it to look inside arrays for debugging, the code shouldn't call that function and it can be removed.

4)The example you posted asked the user if there was a header row.
The If...End If bit of that code tests for the existance of a header row in the input data sheet and puts one in the results sheet if one exists. The test is "if the cell in the nameColumn of the row above masterDataRange contains the string "name", then there is a header row".
The next line puts the samples into the output sheet.
The With...End With takes the user to A1 output sheet.

I'll put together a flow-chart, but I've got a stage gig this weekend, so I won't have time until after.

rangudu_2008
09-28-2008, 01:32 AM
I'm in need of a flowchart that will help me understand your code and help me to incorporate it in my code easily according to my requirements based on the following conditions:

For a new user, irrespective of the rule type, samples should be taken for 6 months - This is based on the join date of the user (vintage).
For all new rules (either of Simple or of Complex type) that are activated in the system to generate records, samples should be taken for first 3 months irrespective of the vintage of the Analyst.To choose samples based on the above conditions, i think it may be required to make some appropriate modifications in the sampling chart & in the Summary sheet.

The list of user names should also include their vintage dates.
An addition column (which specifies the activation date of the rule) along with the RWT value of the Rule in the RuleList sheet.These dates have to be manipulated some way in your code so as to choose samples based on the above conditions. How can it be done?

Looking fwd to ur reply,
Ranga

mikerickson
10-02-2008, 06:17 AM
This is an explaination of how my code does its thing.

The sub FilterAndSample starts out by defining the ranges that are used in the routine.

It then fills the four arrays AllNames, AllRecordStatus, AllRules and AllCodes. AllNames and AllRecordStatus uses the function ArrayOfUniquesInRange to read from the dataRange and return an array of unique alphabetized elenments. AllCodes is explicitly defined while AllRules is read from the "ChartOfSamplePct".

indexFromStatus uses these arrays to give each combination of name/status/rule/code its own indexing number.

For Examle, if
AllNames = {John, Mary, Sam, Tom}
AllRecordStatus = {New, Old}
AllRules = {abc, pdq, xyz}
AllCodes = {1,2,3}

indexFromStatus(John, New, abc, 1) = 1
indexFromStatus(Mary, New, abc, 1) = 2
indexFromStatus(Sam, New, abc, 1) =3
indexFromStatus(Tom, New, abc, 1) = 4
indexFromStatus(John, Old, abc, 1) = 5
indexFromStatus(Mary, Old, abc, 1) = 6
...
indexFromStatus(John, New, pdq, 1) = 9
...
indexFromStatus(Tom, Old, xyz, 1) = 24
indexFromStatus(John, New, abc, 2) = 25
...
indexFromStatus(Tom, Old, xyz, 2) = 48
indexFromStatus(John, New, abc, 3) = 49
...
indexFromStatus(Tom, Old, xyz, 3) = 72

Looking at the name, record status, rule and code of a row in the masterDataRange, indexFromStatus tells us in which of the ArrayofArrays to that row should be put.

Having gone assigned each of the rows to one of the ArrayofArrays, the routine goes through ArrayOfArrays. All of the rows in an array have the same rule and code. The function SamplePctFromRuleCode (essensialy a VLOOKUP from ChartOfSamplePct) multiplied by the number of elements (rows) in that array tells us the number of rows to sample from that array.

The sub RandomlySampleArray, randomly re-orders an array, reduces it to the indicated size via Redim Preserve, Random re-ordering, followed by a reduction in size, is the same as randomly choosing a sample from an array.

The reduced ArrayOfArrays is gone through again, creating the 2D array outPutArray, which is written to the cells in the range outPutGoesHere.

rangudu_2008
10-05-2008, 12:07 AM
Eric,
Inorder to implement the 2 conditions i had mentioned previously into the code, the computation to be done are based on dates.

For the 1st condition, samples should be taken for all new users for the first 6 months (>=180 days)
For the 2nd condition, samples should be taken for newly activated rules for the first 3 months (>=90 days)I know that these date calculations can be implemented in Excel using the TODAY() function.

After checking the 1st condition, the names of the new users for whom the samples have to be chosen will be known.
After checking the 2nd condition, the newly activated rules will be known and the samples have to be chosen for all the users irrespective of their vintage.

The sampling rates of 6%, 8% & 10% have to be taken only for NEW users based on the code value according to the no. of samples specified in the Summary table. Samples may or may not be chosen for other users. Its based on the decision taken by the user at that time. How can these date calculations be implemented in Excel at runtime in your code?

Ranga

rangudu_2008
10-05-2008, 11:26 AM
Eric,
I've attached the Random Chart you developed after making few changes to it in par with the conditions which i'm trying to incorporate.

Currently, i'm trying to find answers to the questions listed below:

1. When a record id is generated in the system for multiple rules, the RWT value of that record id wud be the sum of RWT values of all the rules put together. However, the user can input only one rule in the rule column while processing such type of records. How will such type of records be processed by SamplePctFromRuleCode function which is based on the value of indexFromStatus variable? How is the RWT value for such records taken into calculation for sampling?

2. Regarding the implementation of the 2 conditions, i've the following questions:
i. How will user vintage be considered for computation at runtime for an user whose vintage breached the 6 month period (indicated by values in col N in the chart attached)?
The records processed by the users who have breached the 6 month period during the fortnight have to be taken up for sampling. They can be ignored during the next fortnight.
ii. How will rule activation vintage be considered for computation at runtime for a new rule which breached the 3 month period (indicated by values in col Q)?
Similarly, those records which were generated & processed during the fortnight for newly activated rules which breach the 90 day period must also be taken up for sampling. The records generated for these rules can be ignored during the next fortnight.

How can the conditions be implemented in your code so that the date calculations can be done & samples be chosen? Do u have any solutions?

Ranga

rangudu_2008
10-09-2008, 11:14 AM
Hi Eric,
Had been thinking about this sampling stuff at the back of my head since my last post. This random sampling will be done each Fortnight.

So, for implementing into my code i've decided to implement the logic described below:

For any new rule that is activated, samples should should be taken for the first 3 months - means sampling will be done for that rule for about 6 or 7 fortnights for all users irrespective of their vintage dates.
For any new user, samples should be taken for the first 6 months - means sampling should be done for 12 or 13 fortnights based on vintage date.These activation/vintage dates will either fall in the start/middle/end of a month. So, by determining the fortnight of the activation/vintage date, the no. of times samples have to be taken for a rule or an user will be known.

With this logic, how can the date calculations be applied into your code so that the conditions can be implemented for choosing samples randomly? :help :help :help

Ranga

mikerickson
10-09-2008, 02:26 PM
If John's sampling rate(s) before a particular date is different than John's sampling rate(s) after that date, then the date needs to be incorperated as an argument in both the IndexFromStatus function and the SamplePctFromRuleCode functions.

Also, adding a DatesCol variable would be a part of gracefully implementing that change.

rangudu_2008
10-10-2008, 11:38 AM
Eric,
How can the DatesCol variable be added to your code so as to perform those date calculations & verify those conditions? The end-user should get a msg aftr those conditions are checked & will proceed further with sampling in a different way if they are not met.

When the conditions fail - means when there are no new users & no new rules, the sampling % will be constant (for all users, for all code value & all rules - irrespective of RWT value) which will be input in a cell by the user (or prefixed) & samples should be chosen randomly according to that sampling rate.

I'm already fixing up a bug in my code inorder to populate the source data correctly from which i've to choose samples. :think:
Its been an issue which i'm facing for quite some time now & i wud be fixing it over this weekend (i'm already behind my schedule). I also have to incorporate your sampling code into it as a module & test it fully further.
I find time to work on this only during weekends since the past fortnight & i work alone.

Can u try incorporating these date calculations for implementing the conditions into your code? Sorry to ask u to do this, but i'm need of some help. :help :help

P.S. Kindly replace ur chart with the one which i modified.

Ranga

rangudu_2008
10-13-2008, 07:27 PM
Dear Eric,
Can u plz guide me how to incorporate the date calculations into the sampling chart code?

I'm reallly sorry to bug u about this over & again. Now that i'm done with bug fix over the last weekend, i want to incorporate ur chart code into mine. I've planned to finish this integration by this month-end, address open issues (if any) & then proceed with the testing part of the whole macro.

I'm really delighted with the progress i've made when i look back & i see a BIG ray of hope, now that almost 70% of the work is done.

Hoping to get a reply soon,
Ranga

rangudu_2008
10-17-2008, 07:09 PM
Hey Eric,
I had developed an earlier version of this macro during April this year, which i'm re-working now. I hadn't implemented random sampling then.

I gave a thought about coding for the random sampling process then, but decided against it since the sampling rates were the same for all users irrespective of the code value. The process had changed from 1st of July.

I had a discussion few days back & to my delight i came to knw that the data which i preparing to feed into your random sampling code module (which i debugged last weekend) can be obtained directly from the monitoring system itself. :rotlaugh: :rotlaugh: :yes

Now all that's left is to code for the date calculations & test the random sampling chart, using sample data which i've acquired for testing.

I'm a bit free during this weekend, can u plz guide me to incorporate the date calculations into the sampling chart code?

P.S.: I'll attach the format containing the columns (finalised) that'll be used for sampling in this macro for your reference shortly.

Ranga