PDA

View Full Version : Producing hundreds of Multiple Regression analysis scenarios simultaneously



IndigoPlatea
02-14-2017, 02:20 PM
Hi All,
I am currently working at a large technology consulting organisation and I have been tasked to make a multiple regression analysis model on Excel to predict “Wage Rates” (Column E – my dependant variable). Though I am aware how to run the analysis through the “Data Analysis” toolpak, it’s not feasible to run hundreds of regressions especially when variables are likely to be adjusted monthly.


Having said this, is it possible to produce a VBA or Macro to have the analysis done in the file? I can run a multiple regression analysis per Role (Column D), but that would take a lot time to run it per Role (there are a number of roles per Country) and per Country (there are dozens of countries). Also, analysing what the best trend line is per output in order to formulate the most appropriate equations. Furthermore, using the Summary output to produce an equation to predict the data in the rows. I have examples for two roles, Architect and Developer – both with mock data. In the real example, I would have many more variables GDP, technology lifecycle trend etc.


Again, would there be a function through VBA or Macros which would recognise the Country, Role, the independent variables etc. to produce many Multiple Regression Analysis results at once – based on those criteria? Even tips on how to reorganise my data for the anticipated VBA or Macro would be nice.

I should note - I am not a VBA user at all, though I have implemented simple codes.


Any help on how to proceed would be great.


Thanks

Paul_Hossler
02-14-2017, 07:39 PM
I'd use the TREND() worksheet function

https://support.office.com/en-us/article/TREND-function-e2f135f0-8827-4096-9873-9a7cf7b51ef1


Returns values along a linear trend. Fits a straight line (using the method of least squares) to the arrays known_y's and known_x's. Returns the y-values along that line for the array of new_x's that you specify.

18377

Also, I don't think (and I'm not sure) that you want to consider the period counter as an independent variable

The TREND function could be wrapped in VBA


Note that it is an array entered function - you don't add the { ...} excel does when you use control-shift-enter

18376

Paul_Hossler
02-14-2017, 09:06 PM
Here's a spreadsheet with the use of TREND showing how it generates the same results as your first approach, but also the results if you do not include the period

Paul_Hossler
02-15-2017, 08:55 AM
This macro will add the TREND() formulas in your block of data in the predicted column

Not 100% bullet proof




Option Explicit
Const colKnownY As String = "E:E" ' exactly 3 char
Const colKnownXs As String = "F:H" ' exactly 3 char

Sub AddTrendFormulas()
Dim rData As Range, rTrend As Range, rData1 As Range
Dim sFormula As String
Dim rowStart As Long, rowEnd As Long, i As Long
Set rData = ActiveSheet.Cells(1, 1).CurrentRegion
Set rData1 = rData.Cells(2, 1).Resize(rData.Rows.Count - 1, rData.Columns.Count)
'delete formulas in Y's
On Error Resume Next
Set rTrend = rData.Range(colKnownY).SpecialCells(xlCellTypeFormulas)
rTrend.ClearContents
Set rTrend = Intersect(rData, ActiveSheet.Range(colKnownY)).SpecialCells(xlCellTypeBlanks)
On Error GoTo 0

'check to see if areas are same size
For i = 1 To rTrend.Areas.Count - 1
If rTrend.Areas(i).Rows.Count <> rTrend.Areas(i + 1).Rows.Count Then
MsgBox "Not all blocks are the same size"
Exit Sub
End If
Next i

Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

'sort by country, role, year
With ActiveSheet.Sort
.SortFields.Clear
.SortFields.Add Key:=rData1.Columns(1), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SortFields.Add Key:=rData1.Columns(4), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SortFields.Add Key:=rData1.Columns(3), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SetRange rData
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With

'build formula using first block as template
'=TREND(E2:E13,F2:H13,F14:H26)
sFormula = "=TREND("

With rTrend.Areas(1)
rowStart = 2
rowEnd = .Cells(1, 1).Row - 1

sFormula = sFormula & Left(colKnownY, 1) & rowStart & ":" & Left(colKnownY, 1) & rowEnd & ","
sFormula = sFormula & Left(colKnownXs, 1) & rowStart & ":" & Right(colKnownXs, 1) & rowEnd & ","

rowStart = .Cells(1, 1).Row
rowEnd = .Cells(.Rows.Count, 1).Row
sFormula = sFormula & Left(colKnownXs, 1) & rowStart & ":" & Right(colKnownXs, 1) & rowEnd & ")"

.FormulaArray = sFormula

.Copy
End With

'paste formulas from first area to other areas
For i = 2 To rTrend.Areas.Count
rTrend.Areas(i).Select
ActiveSheet.Paste
Next i
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
MsgBox "Done"

End Sub

Zack Barresse
02-15-2017, 03:04 PM
I've closed the thread due to cross-posting.

http://www.excelforum.com/excel-programming-vba-macros/1173922-producing-hundreds-of-multiple-regression-analysis-scenarios-simultaneously.html

Please read this post: http://www.excelguru.ca/content.php?184