# Thread: Help Required : Random Sampling of Data

1. Sorry - 200 was just a guess for howmany Facility1 lines there were

I made "High Values" a fourth option button since it's logic (as I understand) has nothing to do with the Equal or Proportional random sampling

Capture.JPG

Capture2.JPG

2. Thanks a lot Paul for your time and extended help!
I agree picking high values doesn't match with the word random in the page heading itself "Stratified Random Sampling".
So, I will try to understand your codes and try to make a fifth option for high values based on the proportionate random size. This infract will be duplicating the options, that's the reason I though to have checkbox option.
I think, instead of adding another option, I can keep the original 3 options and call the subs according to the checkbox checked or not.
Next step, I'm also thinking to pick samples only for the strata values selected from the list box.

3. In post #36 ...

I wanted the sample size to be considered like how it was working previously.

And if the check box is checked, it shall pick in the order of highest to lowest values instead of random pick.

If the checkbox is unchecked, then random pick, the way it was working previously
I think that is what I have in there now.

It picks the same number (a user input = 5) of highest values of another column (= SubAmt) for each Strata (=Facility Name)

I think it's cleaner and easier to maintain using different Option Buttons for each choice, instead of trying to mix multiple options

Are you thinking that the number selected ( = 5 above) from each Strata (=Facility Name) will vary proportionally?

You'd have to use a percentage figure I think

Example:

Strata = Facility Name
Other = SubAmt

Percentage input = 1%

Facility 1 has 10,000 records so select 100 highest SubAmt
Facility 2 has 1,000 records so select 10 highest SubAmt
Facility 3 has 5,000 records so select 50 highest SubAmt
Facility 4 has 20,000 records so select 200 highest SubAmt

4. Thanks Paul!
I will try it out

5. Hi Paul,
Hope all well at your end
Basis of understanding some part of your codes, I could do the coding for proportionate samples from a high value filed. I have also made some cosmetic changes in the selection criteria and related codes. However, there is no change in the main codes.

I'm back again as I have a problem here with the sample file attached. In case if the number of strata is more, the tool is getting stuck as I understand it loops from the first row till last row every time for each strata. Is there any way around to resolve this issue. For example, VENDOR_NAME in the attached file.

I had slowness in viewing the strata values (View Statistics) because of the same looping which was later resolved by replacing for looping with countifs.
Could you please look in to it whenever you have time

6. 1 skipped the ProgressBar updating and used Application.StatusBar for speed

Try picking VENDOR Equal and Proportional and see if it's faster

7. Thanks a lot Paul!
Both equal and proportionate are working faster for the random pick method.
I will try the high value pick option by using your above method (storing the entire data range in an array)

8. One doubt
How can I exclude if there is a heading for the data range without converting to a Table?

Attachment 28953
For example in the above case, the first row should be 3 not 2. And it has to be dynamic, there may or mayn't be a heading.
Is it possible to consider the first row which has the last used column, something like that?

9. You could probably make an assumption that if the first row is all Strings and some cell in the second row is NOT a String, then there are headers

Or

Use the Custom Sort approach and get input from the user

Capture.JPG

10. Thanks Paul!
I will check out that option

11. Hi Paul,

I was testing this tool in different work books and I would request your help in the following 2 areas-

(1) How can I sort the strata list based on the count in descending order and show in the list box
`UF_StrataList.lbSummary.List = aryList`
(2) How can I pick samples only for the selected strata item from the list box. In the attached example, if I select the column "DenialCode", 90% of the line items are "blank". Hence, I want to skip the strata "blank" and pick the samples only from few selected strata items.
If there is a selection, pick samples only from the selected strata or else pick from all

12. Dear Paul,
Please find attached the revised file as there is some issue with the file attached previously in #51. I have attached the sample data separately here du to the file size.

13. You can try this

I added 2 pieces of code

I also only used 1700 data rows for the second file

Capture.JPG

```Private Sub cbExit2_Click()
Dim i As Long

'------------------------------------------------------------------------- 9/18/2021
For i = 2 To Me.lbSummary.ListCount - 1
If Me.lbSummary.Selected(i) = True Then
sStrataSelected = Me.lbSummary.List(i)
Exit For
End If
Next i

If sStrataSelected = "" Then
MsgBox "Do All"
Else
MsgBox "Do " & sStrataSelected
End If

Me.lbSummary.Clear
Me.Hide
End Sub```

```    aryList(numStrata + 2, 0) = "Total"
aryList(numStrata + 2, 1) = iTotal
aryList(numStrata + 2, 2) = "100%"

'sort----------------------------------------------------------- 9/18/2021
ReDim arySort(1 To numStrata + 2)
For iStrata = LBound(aryList, 1) To UBound(aryList, 1)
arySort(iStrata) = Format(aryList(iStrata, 1), "000000000000") & Chr(1) & aryList(iStrata, 0) & Chr(1) & aryList(iStrata, 2)
Next iStrata

Call sortQuick(arySort, 2, 32, xlDescending)

For iStrata = LBound(aryList, 1) To UBound(aryList, 1)
vSplit = Split(arySort(iStrata), Chr(1))
aryList(iStrata, 0) = vSplit(1)
aryList(iStrata, 1) = Format(vSplit(0), "0")
aryList(iStrata, 2) = vSplit(2)
Next iStrata

With UF_StrataList.lbSummary
.ColumnCount = UBound(aryList, 2) + 1
.ColumnWidths = "250;50;50"
.List = aryList
End With```

14. Thanks a lot Paul!
I will try it out and let you know.

15. Hi Paul,
Sorting the list box items is having some issue and I'm unable to trace it out.
It looks like, works fine for lesser number of strata and in case larger number strata items, the initial items are sorted not all.
There are 374 strata items in the list below and this is how it looks like

Screenshot 2021-09-19 225610.jpg

It also looks like Total is coming first if all the items are sorted
Screenshot 2021-09-19 225910.jpg
Could you please check and help me?

16. I left some test code in

Change the marked line and see if it's better

```    'sort----------------------------------------------------------- 9/18/2021
ReDim arySort(1 To numStrata + 2)
For iStrata = LBound(aryList, 1) To UBound(aryList, 1)
arySort(iStrata) = Format(aryList(iStrata, 1), "000000000000") & Chr(1) & aryList(iStrata, 0) & Chr(1) & aryList(iStrata, 2)
Next iStrata

Call sortQuick(arySort, 2, numStrata - 1, xlDescending) ' <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<```

17. Thanks Paul!
I have changed it to numStrata + 1
numStrata - 1 was resulting an issue in the second last line

18. You're right -- should have been a +1

19. Hi Paul!,

It is almost ready after 2 months to share with my team members. Thanks for your continued help so far.
I have made few changes in codes to facilitate the option of picking samples from selected strata in list box by adding the selected values to an another collection called "StrataList2".
I request your help in the following -

View statistics is taking long time if the number of strata values are more. For example, "NetAmt" in the sample file attached. Is there any alternate way other than the currently used countif to load the statistics faster?
`cntArray = Application.CountIfs(rRange, StrataList(iStrata))`

20. This seems faster in my tests

```Option Explicit

Sub ViewStastistics()
Dim wsTemp As Worksheet
Dim ptTemp As PivotTable
Dim sStrata As String
Dim aryFromData As Variant
Dim i As Long

Application.ScreenUpdating = False

'fill in blanks if any
On Error Resume Next
rData.Columns(colStrataPicked).SpecialCells(xlCellTypeBlanks).Value = "(blank)"
On Error GoTo 0

'new temp sheet

'create pivot table
ThisWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:=rData, Version:=7).CreatePivotTable _
TableDestination:=wsTemp.Cells(1, 1), TableName:="PivotTable2", DefaultVersion:=7

Set ptTemp = wsTemp.PivotTables(1)

'Count of goes first
With ptTemp
.AddDataField .PivotFields(sStrata), "Count of " & sStrata, xlCount
With .PivotFields(sStrata)
.Orientation = xlRowField
.Position = 1
End With

.RowAxisLayout xlTabularRow
.ColumnGrand = True
.RowGrand = False

.PivotFields(sStrata).AutoSort xlDescending, "Count of " & sStrata

aryFromData = .TableRange2.Value
End With

'delete PT worksheet
wsTemp.Delete

ReDim Preserve aryFromData(LBound(aryFromData, 1) To UBound(aryFromData, 1), LBound(aryFromData, 2) To UBound(aryFromData, 2) + 1)

'put in column heders
aryFromData(LBound(aryFromData, 1), 1) = "Strata Values"
aryFromData(LBound(aryFromData), 2) = "Count"
aryFromData(LBound(aryFromData), 3) = "Percentage"

'put in totals
aryFromData(UBound(aryFromData, 1), 1) = "Total"
aryFromData(UBound(aryFromData, 1), 2) = numRowsOfData
aryFromData(UBound(aryFromData, 1), 3) = "100%"

'put in percentage
For i = LBound(aryFromData, 1) + 1 To UBound(aryFromData, 1) - 1
aryFromData(i, 3) = Format(aryFromData(i, 2) / numRowsOfData, "0.00%")
Next i

With UF_StrataList.lbSummary
.ColumnCount = UBound(aryFromData, 2) + 1
.ColumnWidths = "250;50;50"
.List = aryFromData
End With

Erase aryFromData

Application.ScreenUpdating = True
End Sub```

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•