PDA

View Full Version : Solved: UDF for mann-whitney statistical test?



dygarth
10-19-2009, 09:27 PM
Hello all,

I am finding myself in need of performing a statistical test that is not inherently available within Excel, and so I was hoping that I could create a UDF to implement the needed function. Unfortunately, I believe I am a bit beyond my VBA capabilities here, so I thought I'd turn to experts for help/advice.

The statistical function I desire is the nonparametric Mann-Whitney U test (also known as the unsigned Wilcoxon rank sum test). A good description of how it is calculated is provided at: europe.isixsigma.com / library / content / c080806a.asp.

Can anyone offer suggestions for how best to implement this solution? Thanks in advance for any advice anyone can provide.

==Dygarth

MWE
10-20-2009, 02:47 PM
Hello all,

I am finding myself in need of performing a statistical test that is not inherently available within Excel, and so I was hoping that I could create a UDF to implement the needed function. Unfortunately, I believe I am a bit beyond my VBA capabilities here, so I thought I'd turn to experts for help/advice.

The statistical function I desire is the nonparametric Mann-Whitney U test (also known as the unsigned Wilcoxon rank sum test). A good description of how it is calculated is provided at: europe.isixsigma.com / library / content / c080806a.asp.

Can anyone offer suggestions for how best to implement this solution? Thanks in advance for any advice anyone can provide.

==DygarthWelcome to VBA-X.
I do a lot of this kind of stuff. I can develop a procedure to compute the relevant statistic(s) and get back to you.

dygarth
10-21-2009, 07:08 AM
Hi MWE! Thanks so much for your reply and for your assistance. Is there anything I can do/offer to help with the effort?

MWE
10-21-2009, 07:25 AM
Hi MWE! Thanks so much for your reply and for your assistance. Is there anything I can do/offer to help with the effort?No problem, I enjoy this kind of thing. I have written dozens of stat-oriented procedures in VBA including lots of SixSigma and DFSS stuff and a complete Monte Carlo simulation package that works in any application. I have probably the only version of MSProject that has a complete MonteCarlo simulation capability built-in.

I have written the Mann-Whitney procedure but need to do a little testing to ensure that it is generating correct results. Do you want just the U statistic or do you want the "next step" which is acceptance or rejection of Ho for some confidence interval and DOFs?

dygarth
10-21-2009, 02:59 PM
Thanks; I admire your passion. I guess I envision a Mann-Whitney UDF that is somewhat analogous to Excel's TTEST, in that it accepts two arrays of values as inputs, perhaps an option for one-tailed or two-tailed, and then returns a p-value (ideally, adjusted for ties) of the resultant statistical test. In my mind, any subsequent acceptance or rejection of Ho should be performed by the user (me!) given their desired alpha level.

If I can provide any additional input, please don't hesitate to ask. Once again, I really appreciate your help!

anandbohra
10-21-2009, 10:34 PM
No problem, I enjoy this kind of thing. I have written dozens of stat-oriented procedures in VBA including lots of SixSigma and DFSS stuff and a complete Monte Carlo simulation package that works in any application. I have probably the only version of MSProject that has a complete MonteCarlo simulation capability built-in.

I have written the Mann-Whitney procedure but need to do a little testing to ensure that it is generating correct results. Do you want just the U statistic or do you want the "next step" which is acceptance or rejection of Ho for some confidence interval and DOFs?

:hi:we will be glad if you share your work with us & provide access to your efforts.: pray2:: pray2:: pray2:: pray2:: pray2:
Knowledge is the only thing which increase is you spread it.:thumb

MWE
10-22-2009, 08:13 AM
:hi:we will be glad if you share your work with us & provide access to your efforts.: pray2:: pray2:: pray2:: pray2:: pray2:
Knowledge is the only thing which increase is you spread it.:thumbWhat I write for forum needs and general purpose stuff is shared. I have 30 or so items in the KB as well. But some of my code is the result of significant work over many years, contains non-trivial engineering knowledge and I use it in my consulting business. I am retired, but do some consulting in systems engineering, SixSigma, Lean, DFSS, project management, etc. I know what I am doing, but some/most of my code is not as bullet-proof as I would like it to be for others to use. It is as good as most commercials applications, but not good enough for me to sleep well nights. It is all too simple for folks to grab some code and generate conclusions that are bogus.

So, if you have specific questions, ask me.

Bob Phillips
10-22-2009, 09:15 AM
I fail to see how 'develop a procedure to compute the relevant statistic(s)' and getting back to the OP would be any different from posting it here. If anyone uses it, it is in the knowledge that it was given free, with no warranties of any kind.

MWE
10-22-2009, 11:58 AM
I fail to see how 'develop a procedure to compute the relevant statistic(s)' and getting back to the OP would be any different from posting it here. If anyone uses it, it is in the knowledge that it was given free, with no warranties of any kind. I do not follow you. This thread is about a specific request for a Mann-Whitney procedure which I will post "here" as part of an xls file when I am done ... as I have posted lots of stuff in the past

I think you are confusing the specific request from dygarth for the M-W procedure with what I interpreted to be a much more general request from anandbohra to share my stat/math/engr libraries and applications which I use in my consulting business.

Bob Phillips
10-22-2009, 02:06 PM
I interpreted it very differently. I think he had the same thoughts as me, that you were going to handle it off-line.

If you post it here, great, I too would like to see it.

Zack Barresse
10-22-2009, 06:22 PM
You're the best MWE. :cool:

MWE
10-29-2009, 01:59 PM
dygarth: Attached is a spreadsheet with the Mann-Whitney function and a short test. The method I used is a combination of what was originally proposed and what MiniTab does. In particular:
the two arrays are merged and sorted
ranks are determined and adjusted for any ties
the Mann-Whitney statistic W is computed
the normal approx statistic Zw is computed based on Ha (the alternate hypothesis) and adjusted for any ties
a t dist is used to extract the p value for a given Zw, DOFs and one or two tail test (driven by the Ha)It all seems to makes sense and generates W and Zw values identical to what I have found in examples, but the p values are slightly off (they seem to be high). MiniTab talks about the use of a "continuity correction factor" in their computation of p values, but I do not know what that is or how/if it applies.

Take a look and let me know what you think. For general use, remove the lines in the M-W sub that write values to the activesheet; they are there only for the test proc.

Aussiebear
10-29-2009, 03:08 PM
Very well done MWE.

dygarth
10-29-2009, 07:04 PM
Hi MWE,
Thanks so much for posting this Mann-Whitney implementation! I will review it, test it with my own reference data, and let you know how it goes asap. Thanks again for your assistance with this statistic.
==Dygarth

dygarth
11-01-2009, 09:18 PM
Hello again MWE,

I have been reviewing and testing your Mann-Whitney implementation. First, thanks for making it so easy to understand. It has made the testing and debugging process much easier. Second, great job!

During my review, I believe that I have found two subtle bugs in the code:
(1) In the SortTies subroutine, the routine did not properly handle tie runs that extended to the end of the input array X;
(2) In the MannWhitney subroutine, the computation of Zw within the 'SomeTies = True' Case had a slight error: the last occurrence of "(Nz + 1)" should actually be "(Nz - 1)".

In the revised Excel file attached, I have made these two corrections as well as a few other tweaks. I have been testing this version of the routine against some datasets I had that were previously run in StatGraphics. As with your experience, the p-values were close but not exact. Still not sure why.

One other thing...I would actually like to use this routine as a UDF within Excel, but I can't seem to convert it so that it works properly. Ideally, I'd like to use it in a manner similar to TTEST, with a calling structure something like: =MWTEST(X,Y,nTails). Any suggestions?

Thanks again for all your help.
==Dygarth

dygarth
11-02-2009, 09:01 AM
I just ran across another subtle issue that causes the MannWhitney routine to throw a runtime error: The routine fails if there are *no* ties within the combined arrays. To resolve this issue, I converted SortTies from a sub to a function that returns a boolean True if any ties were found, or False if no ties were found. I then altered the MannWhitney routine to accommodate this change accordingly. See attached. Enjoy!
==Dygarth

MWE
11-02-2009, 10:38 PM
Well, 3 mistakes is not so bad. Or, rather, you correctly identified the 3 bugs I left in for you to fix :devil2:

So, we have both gained something from this.

Regarding your last question about converting the MW sub to a function and calling it like any other Excel function. The rules for calling VBA functions from an Excel cell are approx:
the function can only return a single value
the function can not alter any cells other than the calling cellBUT, most VBA subs that work with arrays have to be rewritten internally because arrays and indices for passed arguements do not work (well, they have never worked for me). You have to use an "items in objects" approach, for example to figure out the length of X and Y to redim Z:
Nx = 0
For Each Item In X
Nx = Nx + 1
Next Item
Ny = 0
For Each Item In Y
Ny = Ny + 1
Next Item
Nz = Nx + Ny
ReDim Z(1 To Nz)
ReDim Ztag(1 To Nz)
ReDim Rank(1 To Nz)
I tweaked the MW sub into a function with MWP(X,Y,Ha) that returns the p value using the items in object approach and commented out the lines that wrote back to Sheet1 so as not to violate rule #2 above. I added a call to MWP in the test sub and MWP works fine when called from another VBA proc. I then added a call from cell J45: =MWP(A2:A40,B2:B43,"X<>Y") and got the correct p value. See attached.

dygarth
11-03-2009, 01:46 PM
Perfect! Thanks so much! It works very nicely. I truly appreciate your help!
==Dygarth

MWE
11-03-2009, 02:05 PM
Perfect! Thanks so much! It works very nicely. I truly appreciate your help!
==DygarthGlad to help. Let me know if you ever figure out why the p-values extracted from the Excel's tdist are not quite right.

Whisky
09-08-2018, 09:46 AM
Hi

This sounds very interesting. I would like to test the excel implementation results against R.

How can I download the final code/workbook please?

Thanks

W

Paul_Hossler
09-08-2018, 02:52 PM
The last post was 9 years ago

Finding the workbook might be a problem

Even the referenced link:

europe.isixsigma.com/library/content/c080806a.asp

doesn't seem to be there

You might have to start with Google and reinvent it

Whisky
09-13-2018, 12:23 PM
Thanks Paul. A few searches on the net does return some excel implementation such as these:
https://www.macmillanihe.com/resources/CW%20resources%20(by%20Author)/B/Brysbaert/student/excel%20guidelines/run-Mann-Whitney-test-Excel.pdf



https://digitalcommons.wayne.edu/jmasm/vol13/iss1/32/



W