PDA

View Full Version : How to export random samples from multiple sheets to a new workbook



jphleung
09-17-2008, 11:31 AM
Hello fellow users,

I am now working on a research thesis, which need to draw sample from data of various companies. It will take quite a long time to do it by hand, so I wonder if there is a faster way to do it, and hence I am trying VBA.

problem:
there are n sheets, where each of them is the data of a company. In those sheets, the entries in column Q (assume there is no header row) are the ones i want to draw sample from. The number of entries in column Q varies from sheet to sheet, they can be as low as 6 or as high as 24000. The size of the sample I want to draw is also different from sheet to sheet. I would like to have the exported entries to be outputed in another workbook (if that is too much work, it's ok to export them to a new sheet of the same workbook), in one single column.

e.g.
1 workbook: WB1.xls
4 sheets inside: GE, Agfa, Du Pont, P&G
where, GE has 50 entries in column Q, Agfa has 40, Du Pont has 55, and P&G has 5
10 entires to be randomly selected from the 50 entries in GE's column Q, 8 to be selected from Agfa, 20 to be selected from Du Pont, and 5 to be selected from P&G.
A total of 43 entries (10+8+20+5) are selected and pasted in column A of a new workbook (or sheet).


I notice that there is some other topics with similar problems, but the one i have requires retrieving samples from multiple sheets.

Jeff

agarwaldvk
09-17-2008, 06:35 PM
Jeff

This is something that you might try - this is on the assumption that 'n' entries in column Q means that there are 'n' records in that worksheet.

Based on this assumption, you could proceed like so :-

You have already decided on the maximum number of sample records that you want from each of those worksheets. Store these numbers in an array.

For each of these worksheets, generate a random number using the Excel RAND() function and round it to the nearest integer less than or equal to the maximum number of records that you have in that particular worksheet. This is the row number for one of the records that you can include in your sample. Copy and paste the data from this row in that worksheet in to another worksheet that will contain the final sample data. Repeat this process as many times as the maximum number of sample records that you want from this worksheet.

Now repeat the above process for each worksheet that has the sample data.

One of the serious limitations of this methodology is that you can generate the same number more than once. To avoid picking up the same record more than once, you will need to first store the rounded random number that you would have generated in another array, sort the array and remove the duplicate entry from that array. I know this could be tedious especially when you might have a large sample size and hence it is recommended that you select a slightly higher number for the 'maximum number of records from each worksheet than that you need for the sample' than that you actually need.

It might be worthwhile to do this despite the fact that this is reasonably code intensive especially if there are a large number of worksheets (company data) that you are trying to work with.


Hopefully, that should get you going.


Best regards


Deepak Agarwal