View Full Version : Data Sets and Regression
Red Salerno
11-05-2013, 05:36 PM
Please see the attached file. As a small subset, I have several historical data sets. I want to run a linear regression between all the various combinations of sets. Is it possible to do this in Excel quickly OR should this be done in another program???
Red Salerno
11-06-2013, 09:25 AM
I forgot to mention that I wanted to do it in VBA
Paul_Hossler
11-06-2013, 05:13 PM
between all the various combinations of sets
What do you mean by this?
The Worksheet function LINEST() can be called from VBA and will return various stat values (see Help)
C vs D
C vs E
D vs E
D vs F
etc.
???
What were you looking to return since LINEST returns a lot of parameters (most of which I don't understand since it's been a LONG time since my last stat course)
Paul
Red Salerno
11-07-2013, 05:50 AM
I agree the LINEST function will return the regression numbers. I was more thinking along the lines of how to set up the data sets from the universe quickly. ie C vs D, C vs E, C vs F, etc, etc, etc. With the 12 pieces of data there are 65 different combinations. I would actually like to have more data and more therefore more data sets. Setting this up manually would be almost impossible if you increase the data inputs ....that's why I was thinking VBA. Just looking for some direction.
Perhaps I should be looking at Access OR MatLab ....programs I really don't know anything about.
I never took Stats, don't even know what a regression or LinEst is, but,,,
In VBA you can use Arrays from Ranges, then iteratively feed the arrays to a Function.
Array1 = Range("A1:D1")
Array2 = Range("A2:D2")
For i = 0 to UBound(Array1
For j = 0 to Ubound(Aray2)
Range("E1").Offset(i, j) = LinEst(Array1(i), Array2(j))
Next j
Next i
Paul_Hossler
11-07-2013, 07:01 AM
I agree the LINEST function will return the regression numbers. I was more thinking along the lines of how to set up the data sets from the universe quickly. ie C vs D, C vs E, C vs F, etc, etc, etc. With the 12 pieces of data there are 65 different combinations. I would actually like to have more data and more therefore more data sets. Setting this up manually would be almost impossible if you increase the data inputs ....that's why I was thinking VBA. Just looking for some direction.
What are you looking to get back? The correlation coeff, m and b, or what?
Looping C-D, C-E, C-M, D-E, D-F, ...., L-M is easy (I think) to return an array of LINEST parameters, but if you look in help for LINEST, what parameters are you looking for?
Paul
Red Salerno
11-07-2013, 04:39 PM
looking for the R2 value
Paul_Hossler
11-07-2013, 08:35 PM
If I understand right
Option Explicit
Sub MyLinEst()
Dim rData As Range
Dim iCol1 As Long, iCol2 As Long
Dim rX As Range, rY As Range
Dim sLabel1 As String, sLabel2 As String
Dim vResults As Variant
Dim iOut As Long
Set rData = Worksheets("Sheet2").Cells(3, 3).CurrentRegion
iOut = rData.Cells(1, 1).Row
With rData
For iCol1 = 2 To .Columns.Count - 1
Set rX = .Columns(iCol1)
sLabel1 = rX.Cells(1, 1).Value
Set rX = rX.Cells(3, 1).Resize(rX.Rows.Count - 2, rX.Columns.Count)
For iCol2 = iCol1 + 1 To .Columns.Count
Set rY = .Columns(iCol2)
sLabel2 = rY.Cells(1, 1).Value
Set rY = rY.Cells(3, 1).Resize(rY.Rows.Count - 2, rY.Columns.Count)
'Known_y's - the set of y-values you already know in the relationship y = mx + b.
'Known_x's - an optional set of x-values that you may already know in the relationship y = mx + b.
'Const - a logical value specifying whether to force the constant b to equal 0.
'Stats - a logical value specifying whether to return additional regression statistics.
vResults = Application.WorksheetFunction.LinEst(rY, rX, True, True)
Worksheets("Sheet2").Cells(iOut, 16) = sLabel1
Worksheets("Sheet2").Cells(iOut, 17) = sLabel2
'r2 is in vResults, row 3, col 1
Worksheets("Sheet2").Cells(iOut, 18) = vResults(3, 1)
iOut = iOut + 1
Next iCol2
Next iCol1
End With
End Sub
Paul
Red Salerno
11-09-2013, 01:08 PM
That's almost what I was looking to accomplish....but in a slightly different format. I was thinking in a matrix output format. So on the matrix output the inputs would run along rows and columns. See the attached chart for an example on sheet 3.
I'm not sure that you two are playing on the same ball field.
In this attachment, sheet1 is showing Paul's results in tabulated format, as close to Red's last example as I could see.
On Sheet 2, I used a very simple LinEst function in Row 2, treating the Date columns as Known X in all formulas, and the underlying column as the known Y for each currencies' data set.
I sorted the table on the date Ascending so that the computed chart line would read X's from left to right in date ascending order. The Y's, of course, would be plotted on the vertical axis.
As you know, I myself am not sure what Red wants, not do I pretend to fully understand Statistical functions. Heh. I spent 2 hours on google trying to figure out what LinEst does.
http://web.alfredstate.edu/quagliato/linest/linest.htm
http://www.excelfunctions.net/Excel-Linest-Function.html
Paul_Hossler
11-10-2013, 05:55 PM
SamT --
In #7 -- Red is looking for the Correlation Coefficient. LINEST can return more parameters than just slope-intercept
In #4 -- Red is looking for the non-Date data columns against each of the other non-Date data columns. I do not think that it is correct to think of the Dates as 'X's and each of the other columns as 'Y's
Red -- feel free to jump in and correct me.
However, I think that the CORREL() worksheet function would simplify the macro (Didn't think of that until I saw SamT and the 'Slope' tag) (Must be getting old)
Option Explicit
Sub MyLinEst_2()
Dim rData As Range
Dim iCol1 As Long, iCol2 As Long
Dim rX As Range, rY As Range
Set rData = Worksheets("Sheet2").Cells(3, 3).CurrentRegion
With rData
For iCol1 = 2 To .Columns.Count
Worksheets("Sheet3").Cells(.Row, iCol1) = .Cells(1, iCol1)
Worksheets("Sheet3").Cells(iCol1 + 1, 1) = .Cells(1, iCol1)
Next iCol1
For iCol1 = 2 To .Columns.Count
Set rX = .Columns(iCol1)
Set rX = rX.Cells(3, 1).Resize(rX.Rows.Count - 2, rX.Columns.Count)
For iCol2 = iCol1 To .Columns.Count
Set rY = .Columns(iCol2)
Set rY = rY.Cells(3, 1).Resize(rY.Rows.Count - 2, rY.Columns.Count)
Worksheets("Sheet3").Cells(iCol1 + 1, iCol2) = Application.WorksheetFunction.Correl(rY, rX)
Next iCol2
Next iCol1
End With
End Sub
Paul
glad this youngster (64 now) could help in even a tiny way. :D
Paul_Hossler
11-11-2013, 08:16 AM
glad this youngster (64 now) could help in even a tiny way. :D
You're almost up there with me :100:
Paul
Red Salerno
11-23-2013, 11:18 AM
10861
One more chink .....the code only works if the data sets are the same size. If as attached one on the data sets is shorter the code breaks. Anyone have thoughts on this?
Paul_Hossler
11-24-2013, 07:47 AM
Option Explicit
Sub MyLinEst_3()
Dim rData As Range
Dim iCol1 As Long, iCol2 As Long
Dim rX As Range, rY As Range, rXtemp As Range, rYtemp As Range
Set rData = Worksheets("Sheet2").Cells(3, 3).CurrentRegion
With rData
'add row and col labels to output sheet
For iCol1 = 2 To .Columns.Count
Worksheets("Sheet4").Cells(.Row, iCol1) = .Cells(1, iCol1)
Worksheets("Sheet4").Cells(iCol1 + 1, 1) = .Cells(1, iCol1)
Next iCol1
'compare each Y col to the Y col to the right
'ASSUMPTION: No gaps in what ever data is there
For iCol1 = 2 To .Columns.Count
Set rX = .Cells(3, iCol1)
Set rX = Range(rX, rX.End(xlDown))
For iCol2 = iCol1 To .Columns.Count
Set rY = .Cells(3, iCol2)
Set rY = Range(rY, rY.End(xlDown))
If rX.Rows.Count > rY.Rows.Count Then
Set rXtemp = rX.Cells(1, 1).Resize(rY.Rows.Count, 1)
Set rYtemp = rY
ElseIf rX.Rows.Count < rY.Rows.Count Then
Set rXtemp = rX
Set rYtemp = rY.Cells(1, 1).Resize(rX.Rows.Count, 1)
Else
Set rXtemp = rX
Set rYtemp = rY
End If
Worksheets("Sheet4").Cells(iCol1 + 1, iCol2) = Application.WorksheetFunction.Correl(rYtemp, rXtemp)
Next iCol2
Next iCol1
End With
End Sub
Not rigorously tested, so check it out
Paul
Powered by vBulletin® Version 4.2.5 Copyright © 2024 vBulletin Solutions Inc. All rights reserved.