PDA

View Full Version : [SOLVED] How to set up spreadsheet for regression analysis in Excel 2016



waimea
12-17-2018, 08:46 AM
I want to loop through a lot of data tables and do multivariate regression analysis on them.

I am asking how to setup my data for data analysis toolpack. I have a lot of spreadsheets with #321 rows (with header) and a dynamic number of columns.

In this thread a OFFSET loop is used:

https://www.mrexcel.com/forum/excel-questions/967452-vba-performing-multiple-regressions-like-800-them-simultaneously.html

And I want to know how my data should be set up for the VBA OFFSET code in the link to work?

Or if there is a better way to set up my data given my 321 rows and dynamic number of columns?



For n = 1 to 800
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range($C$5:$C$11").Offset(0,n), ActiveSheet.Range("$AFL$5:$AFM$11"), _
False, True, , Worksheets("1"), False, False, False, False, , False
Next

waimea
12-17-2018, 10:46 AM
I haven't tried yet but I am thinking that I can read a table into an array and then regress the array?

Perhaps even insert the independent variable into the table as the last column, then read the table into the array and then regress the array.

Is this possible?

Kenneth Hobs
12-17-2018, 06:05 PM
If you attached an example file, it would be easier to help.

See if this helps. http://www.vbaexpress.com/forum/showthread.php?55218-VBA-to-run-a-Linear-Regression-Automatically

waimea
12-17-2018, 11:40 PM
Hi Kenneth,

I have looked at your link and it is very useful.

I have several tables with dependent variables and I have one table with independent variables. The dependent and independent tables are not on the same sheet.

In sheet1 I have 23 X in a table called Data, in sheet2 I have 3 X in a table called Data2.

In sheet3 I have 8 Y in a table called Independent.

I would like to loop through an array of table names (ex. array("data","data2", "more table names"))

and regress all X and Y variables. I think that I can use a for each loop and offset to offset the Y variables?

waimea
12-18-2018, 03:11 AM
I have recorded a macro where I use the function Regress:


Sub Regression()
Application.Run "ATPVBAEN.XLAM!Regress", ActiveSheet.Range("$B$1:$B$321"), ActiveSheet.Range("$B$1:$Q$321"), False, True, , "", True, False, True, True, , False
End Sub


I have two questions:

1.) Is there anyway to use more then 16 x variables / columns in Data Analysis Regression?

2.) How can I loop the input ranges for X and Y?

Jan Karel Pieterse
12-18-2018, 03:37 AM
Make your life easier and just combine the tables into one before starting the regression.

waimea
12-18-2018, 03:58 AM
I have recorded copying the y1 variable into the spreadsheet with the x1, x2, x3 values.

I want to loop this so that it copies the y1, y2, y3, y4, y5,y6, y7,.. y16 variables into sheet2.





Sub ChangeIndependentVariable()
Range("Independent[[#Headers],[y1]]").Select
Range(Selection, Selection.End(xlDown)).Select
Selection.Copy
Sheets("Sheet2").Select

Range("E1").Select

Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Range("Data2[[#Headers],[Name]]").Select
End Sub






Sub ChangeIndependentVariableLoop()

Dim i As Integer

For i = 1 to 7

Range("Independent[[#Headers],[y& i]]").Select
Range(Selection, Selection.End(xlDown)).Select
Selection.Copy
Sheets("Sheet2").Select
Range.Offset(0,i).Select


Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False

Range("Data2[[#Headers],[Name]]").Select
Next i

End Sub




Something like this?

waimea
12-18-2018, 06:58 AM
I am looking at this post from the thread you linked earlier, http://www.vbaexpress.com/forum/showthread.php?55218-VBA-to-run-a-Linear-Regression-Automatically&p=339025&viewfull=1#post339025



Sub Macro2mod()
Dim rX As Range, rY As Range

Set rX = ActiveSheet.Cells(1, 1).End(xlDown)
Set rX = ActiveSheet.Cells(1, 1).Resize(rX.Row, 5)
Set rY = ActiveSheet.Cells(1, 6).Resize(rX.Rows.Count, 1)

Application.SendKeys "{enter}" ' skip overwrite msg but I don't like SendKeys
Application.Run "ATPVBAEN.XLAM!Regress", rY, rX, _
False, True, 95, ActiveSheet.Range("$M$1"), _
False, False, False, False, , False
End Sub



The use of SendKeys is really clever and I haven't seen that before.

How could I loop this code given my 321 rows and different number of columns? (From 1 to 16 columns)

waimea
12-18-2018, 07:44 AM
I found this link https://stackoverflow.com/questions/41325035/looping-a-regression-in-vba with the following code:


Dim x as WorkSheet : Set x = ActiveSheet '<-- I'd recommend to set it with the sheet's name (i.e. x = Worksheets("Xdataset"))

Dim col as Range

For Each col in x.Range("F3:CU22").Columns
Application.Run "ATPVBAEN.XLAM!Regress", col,
x.Range("$C$3:$E$22"), False, False, , col.Cells(0).Value, False, False, False, False, , False
Next


I don't understand how the loops works? What is x.Range("F3:CU22).Columns, what is x.Range("$C$3:$E$22") and what is col.Cells(0)?

Kenneth Hobs
12-18-2018, 09:02 AM
Looks like you will need 8 runs per X dataset listobject. If your listobjects are laid out as in the example file, I would put the output to the right of the usedrange on row 1 on each data sheet with a blank column between each.

I will work up an example using your file. You can change your regress options to suit. Recording a macro is the usual first step.

waimea
12-18-2018, 09:48 AM
Hi Kenneth,

this is very kind of you. I am really struggling with this and all help is very much appreciated.

Kenneth Hobs
12-18-2018, 12:49 PM
Regarding question 1 in post #5, no, 16 is the Regress limit. If you want to do more, you will have to pursue other means. e.g. SAS, R, etc. You can view those other ways by searching for "excel regress 16".

For question 2, after removing the data columns of more than 16 in that one listobject, you can try this code in a Module. It is more involved due to listobjects.


Sub Main()
Dim ws As Worksheet, rX As Range, rY As Range, rO As Range
Dim lO As ListObject, yLO As ListObject, xLO As ListObject
Dim y As Range

Application.DisplayAlerts = False

'Set ListOjbect("Independent")
For Each ws In Worksheets
For Each lO In ws.ListObjects
If lO.Name = "Independent" Then
Set yLO = lO
Exit For
End If
Next lO
If Not yLO Is Nothing Then Exit For
Next ws
If yLO Is Nothing Then GoTo EndSub

'Range for set of Y's
Set y = yLO.DataBodyRange.Columns(2).Resize(, yLO.DataBodyRange.Columns.Count - 1)

'Iterate dataset listobjects and regress
For Each ws In Worksheets
'Assume 1 ListObject per sheet
If ws.ListObjects.Count = 1 Then
If ws.ListObjects(1).Name <> "Independent" Then
Set xLO = ws.ListObjects(1)
Set rX = xLO.DataBodyRange.Columns(2).Resize(, xLO.DataBodyRange.Columns.Count - 1)

For Each rY In y.Columns
'Set range for regress output
Set rO = ws.Cells(1, LastNBCol(ws.Cells) + 2)
'Regress
Application.Run "ATPVBAEN.XLAM!Regress", rY, _
rX, False, False, , rO, _
False, False, False, False, , False
Next rY
End If
End If
Next ws

EndSub:
Application.DisplayAlerts = True
End Sub


Function LastNBCol(rng As Range) As Long
Dim LastColumn As Integer
If WorksheetFunction.CountA(Cells) > 0 Then
'Search for any entry, by searching backwards by Columns.
LastColumn = rng.Find(What:="*", after:=rng.Cells(rng.Rows.Count, rng.Columns.Count), _
SearchOrder:=xlByColumns, _
SearchDirection:=xlPrevious).Column
End If
LastNBCol = LastColumn
End Function

waimea
12-18-2018, 12:58 PM
Hi Kenneth,

you really went above and beyond what I was expecting, this code is great and I am really grateful for your time and your help. You are truly a VBA Guru!

I am looking at the code and I understand some of it. It going to be a long night before I understand all of it.

This code can handle up to 16 Y (independent variables) and 16 X (dependent variables)?

Kenneth Hobs
12-18-2018, 01:08 PM
The number of Y sets is only limited by the number of columns available for output. Or, it could easily be re-coded to poke the regress output into rows until you hit the million row limit....

waimea
12-18-2018, 01:13 PM
Thank you again, this is very kind of you Kenneth!

I'm not sure if I understand it completely, can I have multiple tables with additional X values with a max of 16 X in them?

Kenneth Hobs
12-18-2018, 01:27 PM
Yes. I coded it for one ListObject per sheet. It can be coded otherwise. So, rather that output to row 1, it could be coded to put output to the right on the same row as the ListObect's title row which may or may not be row 1 on the sheet.

waimea
12-18-2018, 01:33 PM
This gets better and better!

And all of this happens in your Function LastNBCol(rng As Range) As Long?

Kenneth Hobs
12-18-2018, 01:48 PM
That function's purpose is to get the last non-blank column on the sheet. I set the output top left cell to be 2 columns to the right of that. That gives a blank column between the outputs.

IF you want output row to the some row as the listobject's title row rather than the sheets row 1:

'Set rO = ws.Cells(1, LastNBCol(ws.Cells) + 2)
Set rO = ws.Cells(xLO.HeaderRowRange.Row, LastNBCol(ws.Cells) + 2)

waimea
12-18-2018, 11:50 PM
This is almost like Christmas morning and getting to open the Christmas presents early!

If I wanted to automate the regression output and automatically lookup the "P-value "against the "Significance F" I would use FIND and OFFSET?

FIND to find SUMMARY OUTPUT and the OFFSET to OFFSET against that cell and VLOOKUP against a table with P-values?

Would it be possible to autofit the output columns? Just to make sure I get what the code is doing, it takes each Y value and regresses them on the tables with X values one X column at the time? Or one Y value on all of the X values?

Kenneth Hobs
12-19-2018, 08:48 AM
I guess one could autofit after each regression or at the end of the Main() macro or separately as shown in the code here.

To answer regress each y set vs one x set or the many x sets (16 max), you can run this code. As you can see in the result on Sheet1, it regresses all x sets together in Main().

As for looking up the stats after the run, I would probable add a unique value in Main(). e.g. SUMMARY OUTPUT Y1, SUMMARY OUTPUT Y2, etc. I would that to make a formula lookup or a macro lookup, easier. Of course during the Main() run, we could add code to return stat parts as needed.


Sub AutoFitAll()
Dim ws As Worksheet
For Each ws In Worksheets
ws.UsedRange.Columns.AutoFit
Next ws
End Sub


Sub SingleLR()
Application.Run "ATPVBAEN.XLAM!Regress", Sheet2.Range("B2:B321"), _
Sheet1.Range("B2:B321"), False, False, , Sheet1.Range("S35"), _
False, False, False, False, , False
End Sub
Note that Sheet2=Worksheets("Sheet3"). Sheet2 is the codename (object) for the sheet whereas "Sheet3" is the tab name.

waimea
12-19-2018, 09:16 AM
Hi Kenneth,

thank you again for your reply and for the updated code. I am looking at Main() right now and I'll try to change it bit by bit.

Paul_Hossler
12-19-2018, 10:07 AM
Late to the game, but it might be easier if you used LINEST in the macro to gernerate the regression coefficents -- either just the coeffecients or more results (see help)

Example and Analysis Toolpack compare in first sheet

I also earlier created a wrapper function that returned similar data in an array -- second sheet

Either way, the macro can receive the outputs from the function and then put on WS the way you want to format for showing the results


I just put X1-X3 and Y1-Y8 data onto a single sheet to make it easier for me to see

waimea
12-19-2018, 10:18 AM
Hi Paul, thank you for your reply.

This is impressive and there is a lot of code to understand, my jaw dropped when I looked at the module.

The 1st sheet "Using WS functions" works but the 2nd sheet "Using VBA function" the coefficents in "O2:R9 " says "#VALUE!"

Paul_Hossler
12-19-2018, 10:51 AM
What version of Excel are you using?

This is an array-entered formula

23443

I'm not sure how many fitting parameters, you need, but LINEST in your macro would probably be easier to deal with that the Analysis addin

Maybe post a sample of what you'd like to end up with

waimea
12-19-2018, 11:09 AM
Hi Paul,

I am using Office 365 Pro Plus.

I realized now that another user also must enable the Data Analysis VBA Toolpak to gain access to the functions which is an unnecessary step!

I am not sure that I would end up with too be honest, it is a long time since I took stats in university.

I want to check which variables that are significant and in some way automate this procedure by comparing the P-value with Significance F.

I have 100+ worksheets with X variables and one worksheet with Y variables, the goal is to find out which variables that are significant and use their coefficients when calculating a weighted score.

What output would you choose from the macro?

waimea
12-20-2018, 06:22 AM
How can I use LINEST in the macro provided earlier in this thread by Kenneth?

Kenneth Hobs
12-20-2018, 12:32 PM
The advantage to the formula array method is that your data will update if data is changed. It will not update fully if you add a row.

1. Do a few manual formula arrays first to see how that is laid out. You need to know how many columns and rows to use for the array in the automated formula method.
2. Then build your formula string for a resized rO's formula. Resize based on findings from (1). This replaces the Regress line of code.

If (2) is too difficult, just do (1) manually and attach the file.

waimea
12-21-2018, 04:48 AM
Hi Kenneth,

at the moment I am using your code and I want to understand your code fully before I change from Regress to LINEST.

I am working on an user form for your code where I can select tables to regress on, select independent and dependent variables.

How would you structure such a user form?

Kenneth Hobs
12-21-2018, 09:06 AM
If you can't be sure that the add-in is set, I am not sure how that would help. It would work if you used Linest().

If you were to use Regress and needed to open the dialog for the user, we could probably do that. Of course they could too as it is in the ribbon.

In both cases, setting the inputs in a coded dialog (userform) would probably make use of Application.Inputbox(), or 3 calls to it in Main(). You would need 3 calls for x, y, and output. For the x and y, the prompt would be to select maybe the top cell(s) for x and just one top cell for y and output cells. The macro would then reset the select to the top to the bottom. That way, x1 or other single or x1toX16 could be set for Xs.

waimea
12-21-2018, 09:55 AM
Hi, thank you for your reply. That user form sounds really complicated.

You are correct on the linest() part but I have to crawl before I can walk.

I can't say that I really understand the difference between regress using data analysis toolpak and Linest() ?

Paul_Hossler
12-22-2018, 07:33 AM
Example of using LINEST




Option Explicit


Sub UseLinEst()
Dim rData As Range
Dim rX As Range, rY As Range
Dim iY As Long
Dim vLinEst As Variant

Set rData = Worksheets("Using LINEST").Cells(1, 1).CurrentRegion

With rData
Set rData = .Cells(2, 1).Resize(.Rows.Count - 1, .Columns.Count)
End With

Set rX = rData.Columns(2).Resize(, 3)

For iY = 6 To 12
Set rY = rData.Columns(iY)

vLinEst = Application.WorksheetFunction.LinEst(rY, rX, True, False)

Worksheets("Using LINEST").Cells(iY - 4, 15).Value = vLinEst(1)
Worksheets("Using LINEST").Cells(iY - 4, 16).Value = vLinEst(2)
Worksheets("Using LINEST").Cells(iY - 4, 17).Value = vLinEst(3)
Worksheets("Using LINEST").Cells(iY - 4, 18).Value = vLinEst(4)

Next iY
End Sub

Kenneth Hobs
12-22-2018, 10:21 AM
You might want to play around with =LinEst() using an array formula. Select a block of x columns plus 1 and 6 rows, add the formula, and press Shift+Ctrl+Enter. I added this manual example to Paul's last example file.

You need to understand what stats it gives vs. Regression method to see if it will meet your needs. See:
https://support.office.com/en-us/article/linest-function-84d7d0d9-6e50-4101-977a-fa7abf772b6d?NS=EXCEL&Version=90&SysLcid=1033&UiLcid=1033&AppVer=ZXL900&HelpId=xlmain11.chm60097&ui=en-US&rs=en-US&ad=US

I added some pseudo data to make more than 16 x sets. I am not sure how reliable =LinEst() is past 16 but it seems to work I suppose.

waimea
12-26-2018, 03:27 PM
Hi Kenneth,

thank you for your reply! I am going to read your link and then decide if I should use your macro and regress or LinEst.

If I wanted to add text to every output in main, example. summary output 1,2,3 etc. How would I go about?