PDA

View Full Version : Data Sets and Regression



Red Salerno
11-05-2013, 05:36 PM
Please see the attached file. As a small subset, I have several historical data sets. I want to run a linear regression between all the various combinations of sets. Is it possible to do this in Excel quickly OR should this be done in another program???

Red Salerno
11-06-2013, 09:25 AM
I forgot to mention that I wanted to do it in VBA

Paul_Hossler
11-06-2013, 05:13 PM
between all the various combinations of sets

What do you mean by this?

The Worksheet function LINEST() can be called from VBA and will return various stat values (see Help)

C vs D
C vs E


D vs E
D vs F


etc.

???

What were you looking to return since LINEST returns a lot of parameters (most of which I don't understand since it's been a LONG time since my last stat course)

Paul

Red Salerno
11-07-2013, 05:50 AM
I agree the LINEST function will return the regression numbers. I was more thinking along the lines of how to set up the data sets from the universe quickly. ie C vs D, C vs E, C vs F, etc, etc, etc. With the 12 pieces of data there are 65 different combinations. I would actually like to have more data and more therefore more data sets. Setting this up manually would be almost impossible if you increase the data inputs ....that's why I was thinking VBA. Just looking for some direction.


Perhaps I should be looking at Access OR MatLab ....programs I really don't know anything about.

SamT
11-07-2013, 06:14 AM
I never took Stats, don't even know what a regression or LinEst is, but,,,

In VBA you can use Arrays from Ranges, then iteratively feed the arrays to a Function.


Array1 = Range("A1:D1")
Array2 = Range("A2:D2")

For i = 0 to UBound(Array1
For j = 0 to Ubound(Aray2)
Range("E1").Offset(i, j) = LinEst(Array1(i), Array2(j))
Next j
Next i

Paul_Hossler
11-07-2013, 07:01 AM
I agree the LINEST function will return the regression numbers. I was more thinking along the lines of how to set up the data sets from the universe quickly. ie C vs D, C vs E, C vs F, etc, etc, etc. With the 12 pieces of data there are 65 different combinations. I would actually like to have more data and more therefore more data sets. Setting this up manually would be almost impossible if you increase the data inputs ....that's why I was thinking VBA. Just looking for some direction.


What are you looking to get back? The correlation coeff, m and b, or what?

Looping C-D, C-E, C-M, D-E, D-F, ...., L-M is easy (I think) to return an array of LINEST parameters, but if you look in help for LINEST, what parameters are you looking for?

Paul

Red Salerno
11-07-2013, 04:39 PM
looking for the R2 value

Paul_Hossler
11-07-2013, 08:35 PM
If I understand right



Option Explicit
Sub MyLinEst()
Dim rData As Range
Dim iCol1 As Long, iCol2 As Long
Dim rX As Range, rY As Range
Dim sLabel1 As String, sLabel2 As String
Dim vResults As Variant
Dim iOut As Long


Set rData = Worksheets("Sheet2").Cells(3, 3).CurrentRegion

iOut = rData.Cells(1, 1).Row

With rData
For iCol1 = 2 To .Columns.Count - 1
Set rX = .Columns(iCol1)
sLabel1 = rX.Cells(1, 1).Value
Set rX = rX.Cells(3, 1).Resize(rX.Rows.Count - 2, rX.Columns.Count)

For iCol2 = iCol1 + 1 To .Columns.Count
Set rY = .Columns(iCol2)
sLabel2 = rY.Cells(1, 1).Value
Set rY = rY.Cells(3, 1).Resize(rY.Rows.Count - 2, rY.Columns.Count)

'Known_y's - the set of y-values you already know in the relationship y = mx + b.
'Known_x's - an optional set of x-values that you may already know in the relationship y = mx + b.
'Const - a logical value specifying whether to force the constant b to equal 0.
'Stats - a logical value specifying whether to return additional regression statistics.
vResults = Application.WorksheetFunction.LinEst(rY, rX, True, True)

Worksheets("Sheet2").Cells(iOut, 16) = sLabel1
Worksheets("Sheet2").Cells(iOut, 17) = sLabel2
'r2 is in vResults, row 3, col 1
Worksheets("Sheet2").Cells(iOut, 18) = vResults(3, 1)
iOut = iOut + 1
Next iCol2
Next iCol1
End With


End Sub




Paul

Red Salerno
11-09-2013, 01:08 PM
That's almost what I was looking to accomplish....but in a slightly different format. I was thinking in a matrix output format. So on the matrix output the inputs would run along rows and columns. See the attached chart for an example on sheet 3.

SamT
11-10-2013, 02:43 PM
I'm not sure that you two are playing on the same ball field.

In this attachment, sheet1 is showing Paul's results in tabulated format, as close to Red's last example as I could see.

On Sheet 2, I used a very simple LinEst function in Row 2, treating the Date columns as Known X in all formulas, and the underlying column as the known Y for each currencies' data set.

I sorted the table on the date Ascending so that the computed chart line would read X's from left to right in date ascending order. The Y's, of course, would be plotted on the vertical axis.

As you know, I myself am not sure what Red wants, not do I pretend to fully understand Statistical functions. Heh. I spent 2 hours on google trying to figure out what LinEst does.

http://web.alfredstate.edu/quagliato/linest/linest.htm

http://www.excelfunctions.net/Excel-Linest-Function.html

Paul_Hossler
11-10-2013, 05:55 PM
SamT --

In #7 -- Red is looking for the Correlation Coefficient. LINEST can return more parameters than just slope-intercept

In #4 -- Red is looking for the non-Date data columns against each of the other non-Date data columns. I do not think that it is correct to think of the Dates as 'X's and each of the other columns as 'Y's

Red -- feel free to jump in and correct me.

However, I think that the CORREL() worksheet function would simplify the macro (Didn't think of that until I saw SamT and the 'Slope' tag) (Must be getting old)




Option Explicit
Sub MyLinEst_2()
Dim rData As Range
Dim iCol1 As Long, iCol2 As Long
Dim rX As Range, rY As Range


Set rData = Worksheets("Sheet2").Cells(3, 3).CurrentRegion

With rData

For iCol1 = 2 To .Columns.Count
Worksheets("Sheet3").Cells(.Row, iCol1) = .Cells(1, iCol1)
Worksheets("Sheet3").Cells(iCol1 + 1, 1) = .Cells(1, iCol1)
Next iCol1


For iCol1 = 2 To .Columns.Count
Set rX = .Columns(iCol1)
Set rX = rX.Cells(3, 1).Resize(rX.Rows.Count - 2, rX.Columns.Count)

For iCol2 = iCol1 To .Columns.Count
Set rY = .Columns(iCol2)
Set rY = rY.Cells(3, 1).Resize(rY.Rows.Count - 2, rY.Columns.Count)
Worksheets("Sheet3").Cells(iCol1 + 1, iCol2) = Application.WorksheetFunction.Correl(rY, rX)
Next iCol2
Next iCol1
End With
End Sub



Paul

SamT
11-11-2013, 05:44 AM
glad this youngster (64 now) could help in even a tiny way. :D

Paul_Hossler
11-11-2013, 08:16 AM
glad this youngster (64 now) could help in even a tiny way. :D

You're almost up there with me :100:


Paul

Red Salerno
11-23-2013, 11:18 AM
10861

One more chink .....the code only works if the data sets are the same size. If as attached one on the data sets is shorter the code breaks. Anyone have thoughts on this?

Paul_Hossler
11-24-2013, 07:47 AM
Option Explicit
Sub MyLinEst_3()
Dim rData As Range
Dim iCol1 As Long, iCol2 As Long
Dim rX As Range, rY As Range, rXtemp As Range, rYtemp As Range

Set rData = Worksheets("Sheet2").Cells(3, 3).CurrentRegion

With rData

'add row and col labels to output sheet
For iCol1 = 2 To .Columns.Count
Worksheets("Sheet4").Cells(.Row, iCol1) = .Cells(1, iCol1)
Worksheets("Sheet4").Cells(iCol1 + 1, 1) = .Cells(1, iCol1)
Next iCol1


'compare each Y col to the Y col to the right
'ASSUMPTION: No gaps in what ever data is there
For iCol1 = 2 To .Columns.Count
Set rX = .Cells(3, iCol1)
Set rX = Range(rX, rX.End(xlDown))

For iCol2 = iCol1 To .Columns.Count
Set rY = .Cells(3, iCol2)
Set rY = Range(rY, rY.End(xlDown))

If rX.Rows.Count > rY.Rows.Count Then
Set rXtemp = rX.Cells(1, 1).Resize(rY.Rows.Count, 1)
Set rYtemp = rY
ElseIf rX.Rows.Count < rY.Rows.Count Then
Set rXtemp = rX
Set rYtemp = rY.Cells(1, 1).Resize(rX.Rows.Count, 1)
Else
Set rXtemp = rX
Set rYtemp = rY
End If

Worksheets("Sheet4").Cells(iCol1 + 1, iCol2) = Application.WorksheetFunction.Correl(rYtemp, rXtemp)
Next iCol2
Next iCol1
End With
End Sub



Not rigorously tested, so check it out

Paul