PDA

View Full Version : [SOLVED] VBA to run a Linear Regression Automatically



nirvehex
02-20-2016, 01:36 PM
Hi,
Hi, I feel like this is a simple question, but I'm not getting any results from Mr. Excel where this question is cross posted: http://www.mrexcel.com/forum/excel-questions/917020-visual-basic-applications-run-linear-regression-automatically.html so I figured I'd try here.

I tried the macro recorder tool, but it won't capture my actions within the data analysis pack for some reason.

I have the data analysis pack installed on my Excel 2010/2013. I'm trying to create VBA code that does the following:

1. Opens up the Linear Regression Tool
2. For the Y Range: C1:Last Row in column C
3. For X Range: A1:Last Row in column B
4. Check off the box for Labels
5. Check off the Confidence Level and ensure its set to 95
6. Select Output Range and set it to M1

Any ideas?

Thank you very much :)

p45cal
02-22-2016, 03:41 AM
something like this:
Sub Macro6()
lrA = Cells(Rows.Count, "A").End(xlUp).Row
lrC = Cells(Rows.Count, "C").End(xlUp).Row
lr = Application.Max(lrA, lrC) 'or Min? I expect the x and y have to have the smae number of values?
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("C1:C" & lr), ActiveSheet.Range("A1:A" & lr), False, True, 95, ActiveSheet.Range("M1"), False, False, False, False, ActiveSheet.Range("X3"), False
End Sub


Arguments (not authoritative):
inpyrng As Variant, Input Y Range
Optional inpxrng As Variant, Input X Range
Optional constant As Variant, Constant is Zero true/false
Optional labels As Variant, Labels true/false
Optional confid As Variant, Confidence Level blank/nothing or 95 for 95%
Optional soutrng As Variant, Output Range "" for a new worksheet letting excel name the sheet, "a new sheet" quote the name of the new sheet, nothing for a new workbook
Optional residuals As Variant, Residuals true/false
Optional sresiduals As Variant, Standardized Residuals true/false
Optional rplots As Variant, Residual Plots true/false
Optional lplots As Variant, Line Fit Plots true/false
Optional routrng As Variant, ?? nothing
Optional nplots As Variant, Normal Probability Plots true/false
Optional poutrng As Variant ?? nothing

I tried using named arguments like inpyrng:= but it wasn't having it, so the arguments have to be in order

If you set a reference to atpvbaen.xls you can use the likes of:

Regress ActiveSheet.Range("$C$1:$C$14"), ActiveSheet.Range("$A$1:$A$14"), True, True, 95, ActiveSheet.Range("$M$1"), True, True, True, True, , True

mancubus
02-22-2016, 03:45 AM
p45cal, typo i think?

lrC >>"C"

p45cal
02-22-2016, 03:54 AM
p45cal, typo i think?

lrC >>"C"
Yes, corrected, thanks.

nirvehex
02-22-2016, 12:44 PM
p45cal, thanks for the tips here! I'm getting a run-time error when the code looks for ATPVBAEN.XLAM. It can't find this file.

Here's what I have for the slightly modified code:



Sub Macro6()
Sheets("Jo - Price Predictor").Select
lrA = Cells(Rows.Count, "A").End(xlUp).Row
lrB = Cells(Rows.Count, "B").End(xlUp).Row
lrC = Cells(Rows.Count, "C").End(xlUp).Row
lrD = Cells(Rows.Count, "D").End(xlUp).Row
lrE = Cells(Rows.Count, "E").End(xlUp).Row
lrF = Cells(Rows.Count, "F").End(xlUp).Row
lr = Application.Max(lrA, lrF) 'or Min? I expect the x and y have to have the same number of values?
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("F1:F" & lr), ActiveSheet.Range("A1:A" & lrE), False, True, 95, ActiveSheet.Range("P1"), False, False, False, False, ActiveSheet.Range("X3"), False
End Sub


Any ideas? Also what is the "X3" for at the end of the code?

nirvehex
02-22-2016, 12:57 PM
Nevermind I got it! Apparently there is a separate Analysis Toolpak - VBA that you have to enable which is separate from the regular Analysis Toolpak. Very cool code! Thanks p45cal! You rock!

p45cal
02-22-2016, 03:31 PM
Two points:
The penultimate argument should not be there; that was just me experimenting.
I think you should be using lr throughout the arguments not lrE at all.

nirvehex
02-24-2016, 07:13 PM
Ok I'll take out the X3 part. Why not use the lrE where it says:


Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("F1:F" & lr), ActiveSheet.Range("A1:A" & lrE),


I want the range to run A1 over to column E and down the last row. So my X range would be A1:last row of E. I don't have that right there?

Also - is there anyway to have the macro press the ok on a prompt that comes up after I run this? What's happening is I'm overwriting a the ANOVA summary table that comes up when the macro pastes the ANOVA table in cell P1, which is exactly what I want it to do, however Excel always asks me if this is ok in a dialogue box.

Thanks for the continued advice! I know this isn't the first one you've helped me on!

p45cal
02-25-2016, 03:05 AM
"So my X range would be A1:last row of E"

??!! So where is the Y range?
I don't know that the regression can handle an X (or Y) range on more than one column or row!

Points on a chart each have 2 parts, an X coordinate and a Y coordinate. No points are missing a coordinate, no points have 3 coordinates (it's not a 3D chart is it?). So I'd hazard a guess that the number of X coordinates should be the same as the number of Y coordinates, which means equal size ranges of the 2 columns being used for the X and Y ranges. If they are different sizes; let's say the bottom of column A was row 10 so the range for say the X-coordinates is A1:A10, and the bottom of row C is row 5 so the range for the Y coordinates is C1:C5, that's great, we have pairs of coordinates for the first 5 points. But which value pairs up with A6, A7 etc.? Blank cells? I don't know how the regression tool handles blanks, perhaps it calls them zeroes. That's absolutely fine if you want blank cells to be counted as zeroes but be aware that it will affect the regression. I somehow doubt you want that.

So as a general rule of thumb, for a 2D chart, you always have pairs of coordinates so it's best if those pairs have values you're sure to want included in the regression calculation (rather than having random blank values) and that there should be an equal number of X and Y coordinates.
This is why in my code suggestion I had the line:
lrA = Cells(Rows.Count, "A").End(xlUp).Row
to find the bottom cell with anything in it in column A and the line:
lrC = Cells(Rows.Count, "C").End(xlUp).Row
to find the bottom cell with anything in it in column C.
To my mind (to anyone's actually), if these values were different then you wouldn't have a pair of coordinates for each point. So I had to use a row number which was the same for both column A and C, which gave rise to the line:
lr = Application.Max(lrA, lrC) 'or Min?
which takes the larger of the two values and puts it into lr, with the intention of using it for both X and Y ranges (Columns A and C) for the regression calculation.
Note the comment I made on the same line:
'or Min? I expect the x and y have to have the same number of values?

Since I plumped for max, then there would be some blank cells in one of the ranges - more fool me, I should have gone for min, that way there'd be no blanks at the bottom of either column that I didn't know how the regression tool handled. I did pose that question in the comment.

Now you've introduced columns B, D and F. I haven't the foggiest what part they play. If you're plotting column A against column F then you've correctly got
lr = Application.Max(lrA, lrF)
(although I would seriously consider using Min)
so in your line:
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("F1:F" & lr), ActiveSheet.Range("A1:A" & lrE), False, True, 95, ActiveSheet.Range("P1"), False, False, False, False, ActiveSheet.Range("X3"), False
there is every chance the X and Y ranges are not the same size, and this is what I think the line should look like:
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("F1:F" & lr), ActiveSheet.Range("A1:A" & lr), False, True, 95, ActiveSheet.Range("P1"), False, False, False, False, , False

This is hard work.

I'm going to test whether Application.displayAlerts=False will work for suppressing the overwriting question.

edit:
It doesn't.
You can always clear the area just before the regression line, something like:
Range("M1:U21").Clear

Kenneth Hobs
02-25-2016, 06:53 AM
I would have tested post #1 had there been an attached file. Sounds like one was shared privately which is much more involved that post #1. There are lots of good points by p45cal. It is an interesting thread.

GoalSeek and Solver routines can be handy.
' Solver, http://support.microsoft.com/kb/843304
' SolverOK, http://msdn.microsoft.com/en-us/library/office/aa272367%28v=office.10%29.aspx

=Slope, =Intercept, and =ForeCast usually fit my needs for simple linear regression.

p45cal
02-25-2016, 07:04 AM
Sounds like one was shared privately which is much more involved that post #1.Nope!

Kenneth Hobs
02-25-2016, 08:08 AM
I see now p45cal that your post#9 covered several of the potential pitfalls one can encounter without seeing the data in post #5. Given that, it is even harder to help without an example file.

I like to solve problems in simple steps which is why I usually ask for a simple file to help others. So, column A for x values and column B for y values with header labels in row 1 makes for the most simple approach.


This relates in a very small way but is more involved than just simple linear regression. I used Slope and Intercept in VBA to determine alpha and beta inputs for a Weibull distribution in this thread. http://www.mrexcel.com/forum/excel-questions/908202-excel-visual-basic-applications-function-sort-array-ascending-perform-calculation.html

=LinEst is also handy.

Paul_Hossler
02-25-2016, 08:05 PM
FWIW the way I'm used to doing regression ....

Using 2016, I could record a macro calling the data analysis pack, and then changed the recorded ranges



Option Explicit
Sub Macro2()
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("$F$1:$F$10"), _
ActiveSheet.Range("$A$1:$E$10"), _
False, True, 95, ActiveSheet.Range("$M$1"), _
False, False, False, False, , False
End Sub


Sub Macro2mod()
Dim rX As Range, rY As Range

Set rX = ActiveSheet.Cells(1, 1).End(xlDown)
Set rX = ActiveSheet.Cells(1, 1).Resize(rX.Row, 5)
Set rY = ActiveSheet.Cells(1, 6).Resize(rX.Rows.Count, 1)

Application.SendKeys "{enter}" ' skip overwrite msg but I don't like SendKeys
Application.Run "ATPVBAEN.XLAM!Regress", rY, rX, _
False, True, 95, ActiveSheet.Range("$M$1"), _
False, False, False, False, , False
End Sub





Since there was not workbook or other data, I just made some up, but I did end up with this

15477

p45cal
02-26-2016, 07:45 AM
Ah right! so you can have 1 Y-range (I tried and it won't let you have more than one column or one row), and multiple x-ranges and it returns with results for each one. Looks good.

What I'm having difficulty with is why it isn't 1 x-range and multiple y-ranges; usually, when doing research, I would have expected a single x-range, as in for example plotting human growth, you might have time on the x-axis, call it age, running left to right at the bottom of the chart. Then at certain intervals you'd measure height and plot it. Say you went for yearly intervals; you'd measure the child annually and plot their height against the vertical y-axis and their age against the horizontal x-axis.
And if you had several children and you wanted to compare them, you'd still have regular 1 year age intervals along the x-axis, and a variety of y values for each subject. What I very much doubt you'd do is have a static set of Y values, say .5 metre, 1 metre, 1.5 metres etc. and say right, let's record the time when they reach these targets, ('now look parents, keep a tab on your child's height, and when they reach 1 metre, for goodness sake don't forget to tell us!').

What… am I missing? (Note this isn't an arrogant question. I realise I am missing something; the people who wrote this regression tool must know their art)

(I realise it makes no difference which axes things are plotted on, it's just convention, but why, seemingly, fly in its face?)

The other thing I noticed is that when you use just a single column for the X range, you get intercept and gradient, fine, but extend that to two columns and you only get the intercept of the second column, the gradient for the first column goes awry and you get a correct gradient for the second column.

Clearly out of my depth on this one.

Paul_Hossler
02-26-2016, 08:16 AM
Going back (way, way, .... way back) to my stat course ...

Y is a dependent function of 5 different independent paramaters ( the X's), so LR comes up with the best set of 5 coefficients and an intercept that best 'fits' the line

So the 'general' trend function would be something like this to 'fit' 5 independent parameters (a, b, c, d, and e)

Y(a,b,c,d,e) = a*N18 + b*N19 + c*N20 + d*N21 + e*N22 + N17

15480

The very few times I've ever used it, I wrapped LINEST like Ken suggested in a UDF and returned an array with the coefficients and intercept (a 1 x 6 array for this example)

I found it easier to do further calculations that way

p45cal
02-26-2016, 12:10 PM
Right, thanks Paul. I'm warming to this.

nirvehex
02-27-2016, 01:04 PM
The send keys command was exactly what I needed, thank you! p45cal Thanks for your help as well. I guess either solutions works for the overwrite. Kenneth, thanks for that link about the Weibull distribution that was interesting!