Consulting

Page 1 of 4 1 2 3 ... LastLast
Results 1 to 20 of 66

Thread: Help Required : Random Sampling of Data

  1. #1
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location

    Help Required : Random Sampling of Data

    Hi Experts,
    I got the attached random sampling code from one of the excel forum. I request your help in customizing the codes so that it be very useful to add it as an excel add-in and use irrespective of the data range. My knowledge to to the below changes is very limited. Any help on this is greatly appreciated.


    1. The code shall work for the entire selection range (currently only 5 columns).
    2. There shall be an option to select the column basis of which the sampling has to be work (currently based on column A). Preferably a user form which loads the headers from the first row of the selection and then the user choose the column header basis of which sampling has to be done.
    3. Options for the results to be copied in to a new worksheet/workbook/add the word "sample" in the column next to the the selection range.


    Thanks in advance

    Option Explicit
    Sub ClearData()
        [H6].ClearContents
        Sheet3.UsedRange.Offset(1).ClearContents
    End Sub
    Sub RandSample()
        Dim arOrig, x, Key, Key1, iSubset, arRes()
        Dim i      As Long, j As Long
        Dim dicRnd As Object, dicSorted As Object, dicUnq As Object, dicRes As Object
        
        iSubset = InputBox("Enter a sample size ", "Sample Size")
        
        If Not IsNumeric(iSubset) Then
            MsgBox "Oops! Please enter a valid sample size.", vbCritical, "Wrong Entry"
            Exit Sub
        End If
            
        arOrig = Selection.Value
        
        Set dicRnd = CreateObject("Scripting.Dictionary")
        Set dicUnq = CreateObject("Scripting.Dictionary")
        For i = 1 To UBound(arOrig)
            Do
                x = Rnd
            Loop Until Not dicRnd.Exists(x)
            dicUnq(arOrig(i, 1)) = dicUnq.Count
            dicRnd(x) = Join(Application.Index(arOrig, i, 0), "~")
        Next
        Set dicSorted = dictKeySortAscending(dicRnd)
        For Each Key In dicRnd.Keys
            dicSorted(Key) = dicRnd(Key)
        Next
        
        Set dicRes = CreateObject("Scripting.Dictionary")
        
        For Each Key In dicUnq.Keys
            i = 0
            Do While i < iSubset
                For Each Key1 In dicSorted.Keys
                    If CStr(Key) = Split(dicSorted(Key1), "~")(0) Then
                        dicRes(dicUnq(Key) & "," & i) = dicSorted(Key1)
                        dicSorted.Remove Key1
                        Exit For
                    End If
                Next
                i = i + 1
            Loop
        Next
        
        ReDim arRes(dicRes.Count, 5)
        i = 0
        For Each Key In dicRes.Keys
            x = Split(dicRes(Key), "~")
            For j = 0 To 4
                arRes(i, j) = x(j)
            Next
            i = i + 1
        Next
        
        Sheet3.Range("A2").Resize(dicRes.Count, 5) = arRes
        MsgBox "Record Count: " & dicRes.Count & vbNewLine & "Unique Names: " & dicUnq.Count & vbNewLine & "Record/Name: " & iSubset
        
    End Sub
    Public Function dictKeySortAscending(dictList As Object) As Object
        Dim curKey As Variant
        Dim sortArray As Object
        Dim i      As Integer
        Set sortArray = CreateObject("System.Collections.ArrayList")
        If dictList.Count > 1 Then
            With sortArray
                For Each curKey In dictList.Keys
                    .Add curKey
                Next curKey
                .Sort
                
                Set dictKeySortAscending = CreateObject("Scripting.Dictionary")
                
                For i = 0 To .Count - 1
                    dictKeySortAscending.Add .Item(i), 1
                Next
            End With
        Else
            dictKeySortAscending = dictList
        End If
        
        Set sortArray = Nothing
    End Function
    Attached Files Attached Files

  2. #2
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,635
    Do not use redundant code.
    The simpler the code the easier to adapt.
    But: never use code you do not fully understand.

    Create unique random numbers:
    Sub M_snb()
        [A1:A1000].Name = "snb"
        [snb] = "=rand()"
        [snb] = [index(rank(snb,snb),)]
    End Sub

  3. #3
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks snb for your time!
    Actually I don't have any programming background and I'm a finance person trying to learn VBA to help in my analytics.
    array codes are little difficult for me to understand as I'm just a beginner to VBA. Appreciate if you can further support me on the above request

  4. #4
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,635
    You can start testing the code in your own workbook.

  5. #5
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    I have prepared a sample tool basis of my little knowledge, but it requires the first two columns to be kept blank for sampling calculations
    I also need help on stratified random sampling section, if the option selected is Proportional to stratum size.
    Request your help and support
    Thanks in advance!
    ss.jpg
    Attached Files Attached Files

  6. #6
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,711
    Location
    1. ID 1031 has "National Insurance Company, UAE" in the Quarter column

    2. It seems that it's either "Simple" XOR "Stratified". Might be less confusing to use a 2 page MultiPage control, one page for each

    3. Better expand on the details of the Stratified sampling, and how each field is used
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  7. #7
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Quote Originally Posted by Paul_Hossler View Post
    1. ID 1031 has "National Insurance Company, UAE" in the Quarter column

    2. It seems that it's either "Simple" XOR "Stratified". Might be less confusing to use a 2 page MultiPage control, one page for each

    3. Better expand on the details of the Stratified sampling, and how each field is used
    Thanks Paul for your response!
    1. For a testing purpose I mentioned "National Insurance Company, UAE" in Quarter and then forgot to remove
    2. Moved into a
    MultiPage control as you suggested
    3. I have explained below the details of Stratified sampling and request your help on this

    Stratum Variable : This list box will have the names of the variables in the data range. First row of the range contains the variable names/column headers, then these names appear in this list box.

    #Strata :
    This text box will show the number unique values from the above selected Stratum Variable. And next to this there is an option to view those unique values.

    Stratified Random Sampling

    Equal from each stratum -
    On specifying the number of records, it highlights sample with same number of records from each stratum.
    For example: if “Quarter” is the selected Stratum Variable and 1 as #Each Records, it will highlight total 4 samples 1 each from each quarters (Q1 to Q4)

    Proportional to stratum size - On specifying the number of records (total samples), it shall highlight samples proportionate to total count of each stratum.
    For example: if “Quarter” is the selected Stratum Variable and 10 as #Total Records, it will highlight total 10 samples based on each quarter count(Q1 to Q4)

    Quarter Count Sample Size
    Q1 57 2 ROUNDUP(10/301*57,0)
    Q2 73 3
    Q3 68 3
    Q4 103 4
    Total 301 12

    In this case the total sample size cannot be less than the #Strata. In the above example if the total sample size is 1, there shall be total 4 samples – 1 each from each strata based on the roundup formula.

    1. Currently
    Simple Random Sampling and Stratified Random Sampling with Proportional to stratum size method are working based on the RANDBETWEEN formula in column B. Request your help if this can be done without the support of a column AND

    2. Stratified Random Sampling with Proportional to stratum size method is not coded/working

    Thanks in advance.
    Attached Files Attached Files
    Last edited by anish.ms; 07-21-2021 at 01:15 AM.

  8. #8
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Please consider the above formula to round instead of roundup
    attached a workbook with Sampling formula
    Attached Files Attached Files
    Last edited by anish.ms; 07-21-2021 at 07:39 AM.

  9. #9
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Hoping your help Paul!

  10. #10
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Hi snb,

    I have replaced my codes with the one you suggested for simple random sampling. And now I understand how simple and short is the code you suggested.

    Public LR      As Long                            'Last Row
    Public LC      As Long                            ' Last Column
    Public SS      As Long                            'Sample Size
    
    SS = Me.txtRandomCount.Text    
        With ActiveSheet
            .Cells(1, LC + 1) = "Sample"
            .Range(Cells(2, LC + 1), Cells(LR, LC + 1)).Name = "Sample"
            [Sample] = "=rand()"
            [Sample] = [index(rank(Sample,Sample),)]
            Set myrange = [Sample]
            For Each mycell In myrange
                If mycell.Value <= SS Then
                    mycell.Value = "Yes"
                Else
                    mycell.Value = Empty
                End If
            Next mycell
        End With
    For the stratified sampling - Could you please guide me, how to apply rand() and rank based on each unique values in the filed selected both for equal and proportionate methods?
    Attached Files Attached Files

  11. #11
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Can anybody help me in throwing some light - how to use the above simple random sampling codes for stratified sampling-
    Based on the values in a selected field - both equal and proportionate methods

  12. #12
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,711
    Location
    Look at version 3a and see if it's what you were thinkin

    I only did the Equal case

    I'll have to think on the proportional

    I'll probably (if I have time) look at the random case also
    Attached Files Attached Files
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  13. #13
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,711
    Location
    Look at this version

    Random, and the 2 strata seem to work
    Attached Files Attached Files
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  14. #14
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks a Ton Paul!
    Yes, the equal one is exactly the same what I was looking for.

    I tested the code for a data with 100,000 rows and 25 columns, your equal code got completed in 1 minute and 13 seconds.
    but my simple random sampling code made excel stuck and completed in 13 minutes and 20 seconds. Request if you can look at this whenever you get time.
    There was a mistake in my code in loading the column headers
    colStrataPicked = WorksheetFunction.Match(combx_Fields.Value, .Range(.Cells(1, 1), .Cells(1, colLast)), 0) ' it was rowLast by mistake
    Once again, many thanks for your time and help Paul

  15. #15
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks Paul! Its working perfectly

    May I request my last 2 modifications on this -

    (1) To add the count and % of each unique values in the list box

    Quarter Count %
    Q1 57 19%
    Q2 73 24%
    Q3 68 23%
    Q4 103 34%
    Total 301 100%

    (2) Second one, I know this is an over loaded request
    If the select filed (stratum variable) has number values. For example Revenue, then the number of strata and the list box shall have 10 slabs based on the max amount. And the equal and proportionate samples shall be based on the slab count and weightage respectively.

    range 1 range 2 count %
    0 500 79 26%
    501 1000 103 34%
    1001 1500 54 18%
    1501 2000 22 7%
    2001 2500 16 5%
    2501 3000 8 3%
    3001 3500 10 3%
    3501 4000 9 3%
    4001 4500 2 1%
    4501 5000 0 0%
    Last edited by anish.ms; 07-27-2021 at 03:20 AM.

  16. #16
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,711
    Location
    You'll need to adjust the formatting to get something that you like
    Attached Files Attached Files
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  17. #17
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks Paul!
    You have helped on the major part; I will try to fit it as per my requirement. And if I need any further help on any specific sub, I will open a new thread.
    Just a small comment - The simple random sampling takes more time than the stratified codes. I just copy pasted the same data up to 50,000 rows and tested for 20 samples. Following are the time taken for each method-
    Simple sampling took 03M:47S
    stratified (equal) sampling took 00M:59S
    stratified (proportionate) took 01M:05S

  18. #18
    VBAX Sage
    Joined
    Apr 2007
    Location
    United States
    Posts
    8,711
    Location
    Try this

    I changed the sort algorithm from a bubble sort, which is simple but slow, to quick sort, which is more complicated, but faster
    Attached Files Attached Files
    ---------------------------------------------------------------------------------------------------------------------

    Paul


    Remember: Tell us WHAT you want to do, not HOW you think you want to do it

    1. Use [CODE] ....[/CODE ] Tags for readability
    [CODE]PasteYourCodeHere[/CODE ] -- (or paste your code, select it, click [#] button)
    2. Upload an example
    Go Advanced / Attachments - Manage Attachments / Add Files / Select Files / Select the file(s) / Upload Files / Done
    3. Mark the thread as [Solved] when you have an answer
    Thread Tools (on the top right corner, above the first message)
    4. Read the Forum FAQ, especially the part about cross-posting in other forums
    http://www.vbaexpress.com/forum/faq...._new_faq_item3

  19. #19
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Thanks Paul!
    Its running very fast now

  20. #20
    VBAX Mentor
    Joined
    Nov 2020
    Location
    Cochin, Kerala
    Posts
    314
    Location
    Hi Paul, in the proportionate method if I choose the column revenue with a sample size of 40 and proceed, it gives me 200 samples?
    Will you be able to look into this?

    if I remove the below line, I'm getting 7 samples

        If n < numRecordsProportional Then
            For iStrata = 1 To numStrata
                If aryStrataSamples(iStrata) = 0 Then aryStrataSamples(iStrata) = 1
            Next iStrata
        End If
    How can I make it at least nearest to the sample size?
    Last edited by anish.ms; 08-01-2021 at 01:20 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •