# Thread: How to analyze big data with Excel

1. Well, it just seems to me that it's hard to take a 2 cities' metric scores and use them

Now if you had a score for each city pair, then it seems you could evaluate the pair which is now just a single paramater

Capture.JPG

So if comparing Karlshamn to Karlskrona for Metric1 was 82, then I think it would be valid to compare that pair against the others

The hard part would be to figure out how to come up with the 82

I took a shot at trying to derive a city pair score from each combination by subtracting the city score from each other in Workbook to be analyzed_2.xlsx and

2. Hi Paul,

thank you for your help with this.

I am going to read the entire thread again and see if I can find a good and meaningfull way to analyze this data.

I am not sure but did you find a way to "select" the possible combinations without reapeating items?

As in your picture above with 10 combinations instead of the whole shebang.

3. Yes

Analysis depends on what you have and what you're trying to determine

1. If you can determine a parameter for each city pair, i.e. how Karlshamn compares to Karlskrona (-5 in A:B) meaning that Karlshamn is -5 below Karlskrona, that's what I think you're after

2. If you can only determine a parameter for single cities, i.e. Karlshamn (E2 = 45) and another number for Karlskrona (E3 = 82), then you / we need to calculate a single 'value' for the Karlshamn - Karlskrona pair

2a. Subtract (H:H) and if negative score that pair as -1, positive +1. Total (N:N) and the Karlskrona - Olofström pair 'wins' ( =3)

2b. Subtract (H:H) and if negative score that pair as 0, positive +1. Total (T:T) and the Karlskrona - Olofström pair 'wins' ( =3)

3. If you decide you don't need to compare city pairs, but only each city by itself, that's a whole lot easier

Capture.JPG

4. Hi Paul,

sorry for awakening a old thread!

I am still working on this and I have to look at your conceptual genius ideas again!

How do you mean with city by itself?

5. Originally Posted by Paul_Hossler
As an CONCEPTUAL EXAMPLE .... … you could look at something like this

10 cities (GREEN), 3 metrics (YELLOW)with each metric having a weighting factor (ORANGE)

The BLUE are just random numbers 0-100 but could be your raw scoring of City-X against City-Y for Metric-M (Col A - Col K)

Each raw score is weighted (Col M - Col N) in Col P - Col W

Weighted scores are summed Col Y - Col Z

The weighted summary is sorted High to Low in Col AB - Col AC

I used 2 UDFs and array functions

F9 to recalc and see what happens

Attachment 24062
I think that this is a good way to go, with weights.

How can I go from my tabular data to a matrix as in your screenshot?

Edit:

I have a small formula that calculates the difference in percentage units between two city pairs.

Say City1 vs City2. City1 could be 30%, and 50% better. Does this help in anyway?

6. There's the raw data and the metric weights on WS Data

The report sheet and the difference between city pairs weighted

Minor final formatting to report sheet so far, since not sure as to how you want the end product. If it's easy enough, let me know

Capture.JPG

7. Hi Paul,

I am looking at your code and I don't understand all of it.

I am at a stage where I can decide what data that is outputted as raw data.

I have score1, score2, total score and categories that are all numeric. There are 14 categories and each category contains a different number of metrics.

I guess that earlier I only saved score1, score2 for city1 and perhaps I should store score1, score2 for city2 aswell?

If you were to select raw data, what kind of data is meaningfull to analyze? And what kind of file structure would you select?

8. You never mention 'Categories' so I don't know how they would play

My Assumptions --

1. For each metric

2. Compare each city against all others by subtracting scores

3. Weight the difference by the metric weight

4. Highest score wins

In my screen shot above, row 7 City K and R have score of 82 and 38 for metric 1 which has a weight of .05

The weighted difference is 2.2 which is the highest for metric 1

Again ....

I have a small formula that calculates the difference in percentage units between two city pairs.

Say City1 vs City2. City1 could be 30%, and 50% better. Does this help in anyway?

I still don't see how your scoring for city pairs is going to work

To me it makes much more sense to score each city in each metric, weight the score, and them pick the city with the highest weighted score in each metric

9. Hi Paul,

I see what you mean and I think that we should try it the way that you are suggesting and forget about the city pairs.

I want to use 4 metrics, score1 and score2 for city 1 and score1 and score2 for city2.

I then add the scores to get 2 total scores which I then weight with weights? This would give me 1 weight for score1 and 1 weight for score2? And perhaps a weight for total score? Or is that just adding useless information?

 Name Score1 Score2 Total score Score1 Score2 Total score City1 City1 37 22 59 9 36 45 City1 City2 16 78 44 29 63 92 City1 City3 48 25 29 79 79 158 City1 City4 80 25 39 31 13 44 City1 City5 61 86 81 41 61 102 City1 City6 73 73 42 48 38 86 City1 City7 73 86 0 9 13 22

This is how I think that my raw data will look like.

10. I can change the raw data layout to something more approtiate if you think it is better?

11. Still looks city-pair oriented (before my red lines)

Capture.JPG

Example

12. Hi Paul,

I like your example and the use of weights and then picking the city with the highest total score.

I think that I have to modify how I create my dataset.

'Cityname1 & Cityname2
wkarr(i, 1) = City1 & " " & City2

'Score1
wkarr(i, 2) = sht.Range("I65").Value2

'Score2
wkarr(i, 3) = sht.Range("J65").Value2

'Score1
wkarr(i, 4) = sht.Range("S65").Value2

'Score2
wkarr(i, 5) = sht.Range("T65").Value2

With Sheets("Rawdata")

R = Cells(Rows.Count, 1).End(xlUp).Row + 1
.Cells(R, 1).Resize(UBound(wkarr, 1), UBound(wkarr, 2)) = wkarr

End With
Should I be moving in the other direction?

'Cityname1 & Cityname2
wkarr(1, i) = City1 & " " & City2

'Score1
wkarr(2, i) = sht.Range("I65").Value2

13. The score1 of say 31 is calculated from a country average and the score2 is calculated on city1 vs city2.

So I am not sure on how to proceed. Without the city pairs there is no score2.

Perhaps there is another way to calculate the scores? Say percentage increase from 2017 to 2018?

Or something else?

14. The score1 of say 31 is calculated from a country average and the score2 is calculated on city1 vs city2.

So I am not sure on how to proceed. Without the city pairs there is no score2.
Why do you need a 'score2'?

I still think you're thinking city pairs and I'm not sure I'd do it that way

Very simple example of how I'm used to doing a weighted score matrix

1. You're looking at 5 cars (green)

2. Five factors are important (yellow)

3. Since some are more important than others, you weight them (blue)

a. Total = 100% , from 0 - 10, with 10 being the best

b. Some are subject (e.g. color) and some can be quantitative (e.g. gas mileage, 1-10 mpg = 2, 11-15 mpg = 3, ....., > 40mpg = 10)

4. Score each car on each metric 0 - 10 (orange)

5. Apply weighting factors (gold)

6. Total a composite score (gray)

7. Rank high to low by composite score (brown)

In this context, it doesn't seem important to me that (for ex) Ford minus Chevy Low Cost = 3 since Toyota has the highest weighted score

Capture.JPG

15. Hi Paul,

I like it very much!

I am trying to write a macro that loops through one combobox with few values to speed up the time it takes to complete the macro.

I will follow your instructions and see what I come up with.

Green is cities, yellow is metrics?

Say I have one category that has 16 possible points,

1-5 points = OK, 6-10 points = GOOD, 11-16 points = GREAT.

Is that how you would do it?

16. Say I have one category that has 16 possible points,

1-5 points = OK, 6-10 points = GOOD, 11-16 points = GREAT.

No sure I understand. Weighted scores would need to be numeric

Do you mean that the category has 16 different metrics, each scored (say) 1 - 10?

E.G.

Category = Demographics

1. Education level -- 1- 10 for each city
2. Household income -- 1- 10 for each city
....
16. Household size -- 1- 10 for each city

17. Hi Paul,

Say I have one category with one metric in it: Tax rate:

I then have these 5 intervalls:

— 30,00
31,00
32,00
33,00
34,00 —
I would then asssign different points based on the intervalls?

-30 = 10 points
31,00 = 8 points
32,00 = 6 points
33,00 = 4 points
34.00 - 2 points
Lower is better and lower value gets higher points.

How would I then weight the metric?

18. Why don't you post a realistic example file, containing data and desiterd results ?
Why so cryptic, hideous, secretive ?
Remember this is a public forum.

19. That's a example of a quantitative parameter

-30 = 10 points
31,00 = 8 points
32,00 = 6 points
33,00 = 4 points
34.00 - 2 points

So if the weighting factor for tax rate as .10 (out of a total of 1.00),

1) City-1 with a 33% rate would have a raw score of 4 and weighting would add 0.4 to the city's composite score
2) City-2 with a 31% rate would have a raw score of 8 and weighting would add 0.8 to the city's composite score

>>>>>> This has the built-in assumption that low tax rates are better than high tax rates <<<<<<

Population, city size (square KM), public parks (square KM), %employment etc. could be treated similarly

City Size, weighting .02

0 - 10,000 = 10 point
10,001 - 25,000 = 9 point
...
Over 10M = 1 point

1) City-1 with 20,000 population would have a raw score of 9 and weighting would add 0.18 to the city's composite score
2) City-2 with 50,000 population would have a raw score of 5 and weighting would add 0.1 to the city's composite score

>>>>>> This has the built-in assumption that small towns/villages are better than large cities <<<<<<

Subjective parameters (quality of life, education, etc.) are a little trickier

I recommend that the data and the scoring algorithm is first, and the reporting is after you're happy with the technique

I think it's easier and more understandable if all scores be 0 - 9, but 10 brackets are not necessary (i.e. 0, 2, 4, 6, and 8 as responses)

Can you provide a small but realistic sample of your data in the format that you collect it in?

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•