Consulting

Results 1 to 11 of 11

Thread: Extracting highest value correlations and series names from Correlation Matrix

  1. #1

    Extracting highest value correlations and series names from Correlation Matrix

    Hi,

    I have calculated a correlation matrix between a data set. What I need to do is for each given series in the results is identify the highest occurring correlation coefficient (other than the correlation between the series itself, which is always one) and then log the pair with such an occurrence to another location in another sheet.

    For example in the attached sheet, for the first series in column B (XOM), the highest correlation of that series is with row 3 - CVX.

    So I want to store XOM CVX say in cells B21 and C21 and correlation coefficient in D21. For the next series in column C (CVX), the highest positive correlation is with row 8 (DDA). I want to store this combination in B22 and C22 and the corresponsing correlation coefficient in D22.

    For the attached example, it is quite easy to do manually for a few series but my final correlation matrix features over 200 variables.

    Hope someone can help.

    Thanks,

    Nick

  2. #2
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,875
    In haste (ie. it could be better); see attached and click the button. In the case of exact duplicates of correlation values it'll only take the first one.
    Code:[vba]Sub blah()
    Dim InputRng As Range
    Set InputRng = Application.InputBox(prompt:="Please select the correlation matrix EXCLUDING headers top and left", Type:=8)
    Dim OutputRng
    Set OutputRng = Application.InputBox(prompt:="Please select the top left cell for where the results will be..(It can be on another sheet)", Type:=8)
    DestRowOffset = 0
    For Each Colum In InputRng.Columns
    PairY = InputRng.Parent.Cells(InputRng.Row - 1, Colum.Column).Value 'Column header
    MaxVal = 0: PairX = ""
    For Each cll In Colum.Cells
    If Not IsEmpty(cll) And cll.Value > 0 And cll.Value <> 1 And MaxVal < Application.Max(MaxVal, cll.Value) Then
    MaxVal = Application.Max(MaxVal, cll.Value)
    PairX = InputRng.Parent.Cells(cll.Row, InputRng.Column - 1)
    End If
    Next cll
    If MaxVal <> 0 Then 'add the pair to the results
    OutputRng.Offset(DestRowOffset) = PairY
    OutputRng.Offset(DestRowOffset, 1) = PairX
    OutputRng.Offset(DestRowOffset, 2) = MaxVal
    DestRowOffset = DestRowOffset + 1
    End If
    Next Colum
    End Sub
    [/vba]
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  3. #3
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,875
    I wonder if the OP even looked at the response.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  4. #4
    Hello P45Cal,

    Yes I did look at response but have had to put this project on hold due to
    other priorites rethinking in terms of what I want to achieve in terms of the final analysis. Anyway, I've just tested the code out on a sample of data and it works great, so thanks.

    In order to get the correlations ranked, I'm simply sorted the data based on the correlation column which is fine. But I noticed that there are no negative correlations and now I am thinking this information would be usefull as well. Would it be easy to include the negative ones as well postive in the output?

  5. #5
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,875
    I think you know, from the tone of my question, why I asked it.

    You're not the only one.

    When someone asks a question here, and someone clearly spends significant time giving them a solution (giving being the operative word) then the very minimum the original poster can do is acknowledge that effort in a timely fashion. If not, there are several consequences:
    1. It p*sses-off the person who supplied the solution (can I use that turn of phrase here?) Why? Because it looks as if the solution provider has so totally wasted his time and effort. He doesn't know if it's been of any use to the OP; has the OP cross-posted elsewhere and already got a solution? Was the solution inadequate? Perhaps the OP has used his solution and is such an ingrate not to thank them?
    2. Ultimately the OP, and anyone else posting a question, will lose out as those willing to give solutions will be discouraged from doing so. I, for one, don't even bother to read new posts from those that have treated me that way in the past (if I can remember their handle that is!). Worse, solution providers will stop even looking at forums such as these.

    Please note that this isn't specifically a gripe at you, but at a significant number of others (who won't, of course, see this) who behave similarly. It's just that yours was the one that finally took it beyond my tolerance level.

    In answer to your question, yes I think it would be fairly easy to include negative correlations, but until I'm sure you've taken the project off on-hold, I'm not even going to begin thinking about it.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  6. #6
    Hello P45Cal,

    Well I just re-started focusing on this project, have been using the macro you created and now have a much better idea of my requirements, what's missing and what I need.

    Seeing the negative correlations would definately be useful but the other more important thing is I just realised that there's a whole bunch of correlations that I should be looking at that are not being included in the final output.

    Let me explain. In the orginal example, there are 9 combination of variables that generate postive correlations (excluding correlations of variables with themselves) yet the macro returns only 5 of these.

    But in my analysis, I should be including all combinations of variables that yield positive (and now negative correlations). So in fact, all I want to do is extract all pairwise correlations accept for the self correlations - i.e the ones with the value of one

    So for example, in the case of a correlation matrix of seven variables, excel will populate 28 correlation coefficents, in my output I want to see all of these accept the self correlations (the ones), i.e 21 correlation coefficents.

    I've had a look at the code but really don't know where to start to modify. Would be really helpful if you could help or indicate if you will be able to.

    Thanks,

    Nick

  7. #7
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,875
    So confirm that we're no longer interested in best correlations (maxima) but all correlations (excluding the 1s). If so:[vba]Sub blah()
    Dim InputRng As Range
    Set InputRng = Application.InputBox(prompt:="Please select the correlation matrix EXCLUDING headers top and left", Type:=8)
    Dim OutputRng
    Set OutputRng = Application.InputBox(prompt:="Please select the top left cell for where the results will be..(It can be on another sheet)", Type:=8)
    DestRowOffset = 0
    For Each cll In InputRng.Cells
    If cll.Value <> 1 And cll.Value <> "" Then
    OutputRng.Offset(DestRowOffset) = InputRng.Parent.Cells(cll.Row, InputRng.Column - 1).Value 'row headetr
    OutputRng.Offset(DestRowOffset, 1) = InputRng.Parent.Cells(InputRng.Row - 1, cll.Column).Value
    OutputRng.Offset(DestRowOffset, 2) = cll.Value
    DestRowOffset = DestRowOffset + 1
    End If
    Next cll
    End Sub[/vba]which just runs through all cells in the (square) correlation matrix and reports the value (when not 1 and not empty) along with the row and column headers for that value.
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  8. #8
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,875
    but perhaps you're looking for the likes of:
    
    Series     Correlates best with   Correlation
      XOM          CVX                 0.738976806
      CVX          XOM                 0.738976806
      XYZ          ZYX                 0.954513523
      ZYX          XYZ                 0.954513523
      NYZ          XYZ                 0.358511319
      BBC          CVX                 0.059801189
      DDA          XOM                 0.437733711
    ?
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  9. #9
    Hi P45Cal,

    Many thanks for changing the code, it gives me exactly what I requested. But yes you are quite right that ultimatley what my goal is to focus on the highest positive (and lowest negative) correlations but I was thinking that if the macro returns all the correlations coefficients I can just sort them and easily identify the highest e.g. Coefficents with a value of say above 0.20 and the negative coefficents less than -0.20.

    But from what you're suggesting, it seems it would be easy to build this into the code? Although I think I need to make the threshold criteria for inclusion in the output (0.20) dynamic. By that I mean I might want to change the criteria based on each dataset that I analyse, e.g from 0.20 to 0.40.

    Anyway, I can work with the current code, if it doesn't take up too much of your time then a more sophisticated solution would be most welcome.

    Thanks,

    Nick

  10. #10
    Knowledge Base Approver VBAX Wizard p45cal's Avatar
    Joined
    Oct 2005
    Location
    Surrey UK
    Posts
    5,875
    re:"make the threshold criteria for inclusion in the output (0.20) dynamic.. ..criteria based on each dataset"

    In words, how would you make that decision? Would it be a binary decision (either 0.2 or 0.4 and nothing else) or could it be a sliding scale value?
    p45cal
    Everyone: If I've helped and you can't be bothered to acknowledge it, I can't be bothered to look at further posts from you.

  11. #11
    Hello again,

    No I didn't envisage a binary threshold as I think they would be highly limitating. Logically it should be be a function of the range of correlation coefficients each dataset returns. So for example, with a data that returns lots of correlations at above say 0.8 then the threhold should be quite high (e.g 0.4), with a data set that returns very few high correlations or none at all, the threshold should be lower so that I have enougth observations to analyse for that dataset.

    I was thinking that the easiest way would be to have a line in the code where I can manually change the threshold based on what the correlation matrix is showing for each data or if this is automatically built into the code then have an option of a manual overide so I can see all the correlations if I want to. But you probably have a better idea of the best way to do it. The key thing for me is something with a degree of flexibility so that I can change the criteria easily if I need to.

    Thanks,

    Nick

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •