Sorry - 200 was just a guess for howmany Facility1 lines there were
I made "High Values" a fourth option button since it's logic (as I understand) has nothing to do with the Equal or Proportional random sampling
Attachment 28888
Attachment 28889
Printable View
Sorry - 200 was just a guess for howmany Facility1 lines there were
I made "High Values" a fourth option button since it's logic (as I understand) has nothing to do with the Equal or Proportional random sampling
Attachment 28888
Attachment 28889
Thanks a lot Paul for your time and extended help! :bow:
I agree picking high values doesn't match with the word random in the page heading itself "Stratified Random Sampling".
So, I will try to understand your codes and try to make a fifth option for high values based on the proportionate random size. This infract will be duplicating the options, that's the reason I though to have checkbox option.
I think, instead of adding another option, I can keep the original 3 options and call the subs according to the checkbox checked or not.
Next step, I'm also thinking to pick samples only for the strata values selected from the list box.
In post #36 ...
I think that is what I have in there now.Quote:
I wanted the sample size to be considered like how it was working previously.
And if the check box is checked, it shall pick in the order of highest to lowest values instead of random pick.
If the checkbox is unchecked, then random pick, the way it was working previously
It picks the same number (a user input = 5) of highest values of another column (= SubAmt) for each Strata (=Facility Name)
I think it's cleaner and easier to maintain using different Option Buttons for each choice, instead of trying to mix multiple options
Are you thinking that the number selected ( = 5 above) from each Strata (=Facility Name) will vary proportionally?
You'd have to use a percentage figure I think
Example:
Strata = Facility Name
Other = SubAmt
Percentage input = 1%
Facility 1 has 10,000 records so select 100 highest SubAmt
Facility 2 has 1,000 records so select 10 highest SubAmt
Facility 3 has 5,000 records so select 50 highest SubAmt
Facility 4 has 20,000 records so select 200 highest SubAmt
Thanks Paul!
I will try it out
Hi Paul,
Hope all well at your end
Basis of understanding some part of your codes, I could do the coding for proportionate samples from a high value filed. I have also made some cosmetic changes in the selection criteria and related codes. However, there is no change in the main codes.
I'm back again as I have a problem here with the sample file attached. In case if the number of strata is more, the tool is getting stuck as I understand it loops from the first row till last row every time for each strata. Is there any way around to resolve this issue. For example, VENDOR_NAME in the attached file.
I had slowness in viewing the strata values (View Statistics) because of the same looping which was later resolved by replacing for looping with countifs.
Could you please look in to it whenever you have time
Thanks in Advance!
1 skipped the ProgressBar updating and used Application.StatusBar for speed
Try picking VENDOR Equal and Proportional and see if it's faster
Thanks a lot Paul!
Both equal and proportionate are working faster for the random pick method.
I will try the high value pick option by using your above method (storing the entire data range in an array)
One doubt
How can I exclude if there is a heading for the data range without converting to a Table?
Attachment 28953
For example in the above case, the first row should be 3 not 2. And it has to be dynamic, there may or mayn't be a heading.
Is it possible to consider the first row which has the last used column, something like that?
You could probably make an assumption that if the first row is all Strings and some cell in the second row is NOT a String, then there are headers
Or
Use the Custom Sort approach and get input from the user
Attachment 28954
Thanks Paul!
I will check out that option
Hi Paul,
I was testing this tool in different work books and I would request your help in the following 2 areas-
(1) How can I sort the strata list based on the count in descending order and show in the list box
(2) How can I pick samples only for the selected strata item from the list box. In the attached example, if I select the column "DenialCode", 90% of the line items are "blank". Hence, I want to skip the strata "blank" and pick the samples only from few selected strata items.Code:UF_StrataList.lbSummary.List = aryList
If there is a selection, pick samples only from the selected strata or else pick from all
Thanks in advance!
Dear Paul,
Please find attached the revised file as there is some issue with the file attached previously in #51. I have attached the sample data separately here du to the file size.
Thanks in Advance!
You can try this
I added 2 pieces of code
I also only used 1700 data rows for the second file
Attachment 28983
Code:Private Sub cbExit2_Click()
Dim i As Long
'------------------------------------------------------------------------- 9/18/2021
For i = 2 To Me.lbSummary.ListCount - 1
If Me.lbSummary.Selected(i) = True Then
sStrataSelected = Me.lbSummary.List(i)
Exit For
End If
Next i
If sStrataSelected = "" Then
MsgBox "Do All"
Else
MsgBox "Do " & sStrataSelected
End If
Me.lbSummary.Clear
Me.Hide
Unload Me
End Sub
Code:aryList(numStrata + 2, 0) = "Total"
aryList(numStrata + 2, 1) = iTotal
aryList(numStrata + 2, 2) = "100%"
'sort----------------------------------------------------------- 9/18/2021
ReDim arySort(1 To numStrata + 2)
For iStrata = LBound(aryList, 1) To UBound(aryList, 1)
arySort(iStrata) = Format(aryList(iStrata, 1), "000000000000") & Chr(1) & aryList(iStrata, 0) & Chr(1) & aryList(iStrata, 2)
Next iStrata
Call sortQuick(arySort, 2, 32, xlDescending)
For iStrata = LBound(aryList, 1) To UBound(aryList, 1)
vSplit = Split(arySort(iStrata), Chr(1))
aryList(iStrata, 0) = vSplit(1)
aryList(iStrata, 1) = Format(vSplit(0), "0")
aryList(iStrata, 2) = vSplit(2)
Next iStrata
With UF_StrataList.lbSummary
.ColumnCount = UBound(aryList, 2) + 1
.ColumnWidths = "250;50;50"
.List = aryList
End With
Thanks a lot Paul!
I will try it out and let you know.
Hi Paul,
Sorting the list box items is having some issue and I'm unable to trace it out.
It looks like, works fine for lesser number of strata and in case larger number strata items, the initial items are sorted not all.
There are 374 strata items in the list below and this is how it looks like
Attachment 28986
It also looks like Total is coming first if all the items are sorted
Attachment 28987
Could you please check and help me?
I left some test code in
Change the marked line and see if it's better
Code:'sort----------------------------------------------------------- 9/18/2021
ReDim arySort(1 To numStrata + 2)
For iStrata = LBound(aryList, 1) To UBound(aryList, 1)
arySort(iStrata) = Format(aryList(iStrata, 1), "000000000000") & Chr(1) & aryList(iStrata, 0) & Chr(1) & aryList(iStrata, 2)
Next iStrata
Call sortQuick(arySort, 2, numStrata - 1, xlDescending) ' <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Thanks Paul!
I have changed it to numStrata + 1
numStrata - 1 was resulting an issue in the second last line
You're right -- should have been a +1
Hi Paul!,
It is almost ready after 2 months to share with my team members. Thanks for your continued help so far.
I have made few changes in codes to facilitate the option of picking samples from selected strata in list box by adding the selected values to an another collection called "StrataList2".
I request your help in the following -
View statistics is taking long time if the number of strata values are more. For example, "NetAmt" in the sample file attached. Is there any alternate way other than the currently used countif to load the statistics faster?
Thanks in advance!Code:cntArray = Application.CountIfs(rRange, StrataList(iStrata))
This seems faster in my tests
Code:Option Explicit
Sub ViewStastistics()
Dim wsTemp As Worksheet
Dim ptTemp As PivotTable
Dim sStrata As String
Dim aryFromData As Variant
Dim i As Long
Application.ScreenUpdating = False
'fill in blanks if any
On Error Resume Next
rData.Columns(colStrataPicked).SpecialCells(xlCellTypeBlanks).Value = "(blank)"
On Error GoTo 0
sStrata = aryStrataHeaders(colStrataPicked)
'new temp sheet
Set wsTemp = Worksheets.Add
'create pivot table
ThisWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:=rData, Version:=7).CreatePivotTable _
TableDestination:=wsTemp.Cells(1, 1), TableName:="PivotTable2", DefaultVersion:=7
Set ptTemp = wsTemp.PivotTables(1)
'Count of goes first
With ptTemp
.AddDataField .PivotFields(sStrata), "Count of " & sStrata, xlCount
With .PivotFields(sStrata)
.Orientation = xlRowField
.Position = 1
End With
.RowAxisLayout xlTabularRow
.ColumnGrand = True
.RowGrand = False
.PivotFields(sStrata).AutoSort xlDescending, "Count of " & sStrata
aryFromData = .TableRange2.Value
End With
'delete PT worksheet
Application.DisplayAlerts = False
wsTemp.Delete
Application.DisplayAlerts = True
ReDim Preserve aryFromData(LBound(aryFromData, 1) To UBound(aryFromData, 1), LBound(aryFromData, 2) To UBound(aryFromData, 2) + 1)
'put in column heders
aryFromData(LBound(aryFromData, 1), 1) = "Strata Values"
aryFromData(LBound(aryFromData), 2) = "Count"
aryFromData(LBound(aryFromData), 3) = "Percentage"
'put in totals
aryFromData(UBound(aryFromData, 1), 1) = "Total"
aryFromData(UBound(aryFromData, 1), 2) = numRowsOfData
aryFromData(UBound(aryFromData, 1), 3) = "100%"
'put in percentage
For i = LBound(aryFromData, 1) + 1 To UBound(aryFromData, 1) - 1
aryFromData(i, 3) = Format(aryFromData(i, 2) / numRowsOfData, "0.00%")
Next i
Load UF_StrataList
With UF_StrataList.lbSummary
.ColumnCount = UBound(aryFromData, 2) + 1
.ColumnWidths = "250;50;50"
.List = aryFromData
End With
Erase aryFromData
Application.ScreenUpdating = True
End Sub