I have a lot of data but I am not sure how to analyze it with Excel.
Should I use regressions? Pivot tables? Something else?
What is the first step in analyzing big data??
Printable View
I have a lot of data but I am not sure how to analyze it with Excel.
Should I use regressions? Pivot tables? Something else?
What is the first step in analyzing big data??
Power Query and Power Pivot and pivot tables.
Hi xld,
thank you for your answer!
I have tabular data and my first data table has 1600 rows and 100+ columns.
In what way should I try to transform my data before analysis? Should it be tabular or non-tabular?
I am trying to compare cities against each other and find out what city-city combinations is the best.
That amount of data is trivial for these tools.
You should look to denormalise it so that you can pivot the data and get the sort of analysis you require. Power Query can import that data as it is and unpivot it so that it can then be pivoted.
Hi again xld,
thank you for your reply!
I have unpivoted my columns using PowerQuery and I now have 3 columns, city name, metric and value.
How should I proceed?
On the Home tab, click the arrow on the Close & Load button (Furthest left) and select Close & Load To. In the popup dialog, select connection only and Add to Datamodel. Exit back to Excel.
Now in Excel, go to Insert>Pivot Tables, select 'Use this workbook's Data Model' from the popup dialog, you should see your query there which you can now use in a pivot table.
Hi xld,
thank you for your instructions, I now have a data model with my data in it but when I click on metric in my field list I get a list of all metrics.
Is it possible to single out them one by one in the way my data started out like?
Put the metric in the rows pane not the value pane.
Hi xld,
I have followed your instructions but I am not sure on how the unpivoted data when pivoted in a pivottable is better?
After I have my datamodel should I create measures to analyze the data in PowerPivot? Or try to use regression on the data?
What you have is a set of raw data that you can now analyse. How you analyse it I cannot say, I do not know the data or what you are trying to achieve.
Hi xld,
thank you for your reply. I don't know how I should analyze my data.
I am comparing cities against each other on a lot of different metrics, more is better and there is a min and a max score.
Say I have cities A, B and C.
Then I am comparing City A vs City A. City A vs City B. City A vs City C.
City B vs City A, City B vs City B. City B vs City C.
City C vs City A. City C vs City B. City C vs City C.
I want to find the combination of highest score between all combinations of the different cities.
Any suggestions on how I can analyze data that is as in the above post??
Why compare a city against itself?
If you have 10 cities (C01 - C10) and 15 metrics (M01 - M15) to compare, then I'd figure you'll have 15 x (10 x 9) / 2 = 675 'compares' comparing each metric for each city pair
e.g.
C01 - C02 - M01, C01 - C02 - M02, … C01 - C02 - M15Sounds like you started with a 10 row, 16 column array of data using my made up numbers above ??
C01 - C03 - M01, C01 - C03 - M02, … C01 - C03 - M15
….
C09 - C10 - M01, C09 - C10 - M02, … C09 - C10 - M15
Hi Paul,
thank you for your reply! I am comparing a city against itself only because I loop trough "all items" in two comboboxes.
There is no value in comparing city A with city A.
I am not sure what I am starting with at the moment. I will have to look at my file again.
I will post a sample workbook. Do you have any suggestions on how I should analyze the "compares"?
As an CONCEPTUAL EXAMPLE .... :yes … you could look at something like this
10 cities (GREEN), 3 metrics (YELLOW)with each metric having a weighting factor (ORANGE)
The BLUE are just random numbers 0-100 but could be your raw scoring of City-X against City-Y for Metric-M (Col A - Col K)
Each raw score is weighted (Col M - Col N) in Col P - Col W
Weighted scores are summed Col Y - Col Z
The weighted summary is sorted High to Low in Col AB - Col AC
I used 2 UDFs and array functions
F9 to recalc and see what happens
Attachment 24062
Hi Paul,
thank you for your reply! You are a conceptual example genius!
I am looking at your workbook right now and I am trying to understand it!
How did you determine the weights?
Hi Paul,
thank you for your reply.
How would you go about finding the city/cities with the highest score?
The 'best' City Pair is in AB2 in my example above. The City01-City02 has a score of 70 for Metric-01
You could give City01 a 70 and City02 a 70 for Metric-01, and then weight Metrics, and, etc.
Attachment 24066
This makes City-04 the 'best' with a composite score of 530.95
It really all depends on how you weight the metrics and how you choose to determine the 'best'
Hi Paul,
thank you for your reply! I really like your idea with a comsposite score.
I will download your workbook later today and see if I can recreate it.
Thank you again!
Hi again Paul,
I have now had a good look at your two conceptual examples and both are breathtakingly good! I am looking at your UDF right now.
However, my data output from my macro is not in the same form. Is there an easy way to change my output macro to suit your conceptual examples?
I have condensed my data to only have 3 metrics.
1. Make my data the same as in your example
2. Make the code aware of how many cities there are
eg. my first batch of cities has 40 cities, the next batch has 59, the next 26 etc.. down to a mere 1 city.
All suggestions are welcome.
Sure
However, you would not want to include city pairs (CityX CityY) where (a) X = Y and where (b) X > Y
(A) X = Y would double count (e.g. City01 City02), and
(b) X > Y would have been counted earlier where X < Y (i.e. City02 City 04 counted, but City04 City 02 would be counted again)
Weighting factors?
Did you want to score city pairs or each city by itself?
Hi Paul,
thank you for your reply! You are too clever for me and this is above and beyond what I could ask for!
I want to score each city for itself. Since the combination of city pairs only is a comparison and I want to find the cities with the highest composite score.
I am not sure about Weighting factors, I was planning to use it as you did in your conceptual example with one weight for each metric.
Q: what does the originalest data look like? You said output of your macro, implying that it's processed or reformated somehow
Q: I assume the cities have real names and not -nn numbers ("New York City", Los Angeles", "London", ...)
Q: I assume the metrics have real names ("Size", "Quality of Life", Good Fishing", ...)
Your basic premise (as I understand it) is to (A) compare two cities and 'grade' them against each other using various metrics (e.g. "LA vs NYC on size is 66")
It is NOT to (B) just 'grade' each city on the metrics(e.g. "LA on size is 66" and "NYC on size is 75")
I've never done it the A-way before, just the B-way
Hi Paul,
thank you again for your reply! It is getting late here and I will make a sample workbook tomorrow.
The cities have real names, the metrics have real names and your premise is correct (I think).
Do you see any advantages or disadvantages of the A-way?
A way: I just think it'll be harder for you to score the city pair comparisonsQuote:
Do you see any advantages or disadvantages of the A-way?
Example: say that the Metric = "Quality of Life"
LA - NYC = 56
LA - London = 67
London - Sydney = 78
The raw data using the "Each City" approach in post #20 would have
LA = 123
London = 145
etc.
B way: I just think it'll be easier for you to score each city for each metric
Example: say that the Metric = "Quality of Life" and the score for individual cities are
LA = 56NYC = 67
London = 78
Sydney = 89
The sum by city and weighted would give you the composite score
Doesn't have to be 0 - 100, I've seen just five values used -2, -1, 0, 1, 2
Hi Paul,
thank you for your reply! I have read all of your posts again and you are spot on!
I don´t want to double count and I don´t want to use the A way.
I have condensed my metrics down to two metrics.
Metric A is the score of city vs city. Metric B is the score of city vs state. I am planning to weight my metrics 50% and 50%. So city01 vs city02 in metric A could be 55 and metric B could be 77. Metric A is a combination of other metrics and a score of 55 means that there are 55 comparisons where metric A is above the City 2 value and 77 is also a combination of metrics where metric B is above or equal to the state mean.
Ex. City01 has a value of size that is greater then City02 and also greater then the state mean. The state mean is calculated from all cities in that state.
So what I am trying to do now is to get my output (in the attachment above) to be as in your example.
But I am not sure how?
Been thinking on thisQuote:
So city01 vs city02 in metric A could be 55 and metric B could be 77. Metric A is a combination of other metrics and a score of 55 means that there are 55 comparisons where metric A is above the City 2 value and 77 is also a combination of metrics where metric B is above or equal to the state mean.
Metric 1 - So if
1 vs 2 = 55
1 vs 3 = 45
1 vs 4 = 65
2 vs 3 = 70
2 vs 4 = 60
3 vs 4 = 30
how would you generate a single rank-able score for
1 = ?
2 = ?
3 = ?
4 = ?
Ranking a PAIR of cities is easy: 2-3, 1-4, 2-4, 1-2, 1-3,3-4
But I don't know what that tells you
Metric 2 is much more straight foward
1 = 35 above state mean
2 = 15 above state mean
3 = 45 above state mean
4 = 20 above state mean
Rankings are City 3, 1,4, 2
Hi Paul,
thank you for your reply! I am looking at sports rankings where they count wins, draws or losses.
The best city would be the city with the most wins?
I am working on a sample workbook with my macro that creates the output, I will post it as soon as I get it more complete.
Here is a sample workbook which represents my actual workbook pretty good.
I am using a macro to loop through two comboboxes and store the value of metric A and metric B in a new worksheet called Score.
Is is the table in Score that I want to analyze and find the best combination of cities.
Any suggestions on how I can analyze this data?
The sample workbook uses the state with the least number of cities.
OK, see if this floats your boat. I didn't want to go too far in case it wasn't what you were looking for
For the first composite metric
I compared the score for each metric for each combanation of cities, and if CityX+MetricM minus CityY+MetricM was positive then add 1 to CityX/CityY score, if negative subtract 1
For the second composite metric
I compared the score for each metric for each combanation of cities, and if CityX+MetricM minus Average for MetricM was positive then add 1 to CityX score, if negative subtract 1
Attachment 24120
The worksheet "Dump" is not really needed - I just used it to display intermediate array results
The worksheet "Score" is not automatically created yet
Hi Paul,
thank you for your reply! It is getting late here and I am not sure that I follow what you are doing.
I will take a fresh look tomorrow at your workbook. I think that you float my boat! Thank you again for your response!
Hi again Paul,
I have now had a long look on your workbook and I don't understand all of your code in mod_PH1.
However, I think that I understand your output.
You are taking "CityX+MetricM minus CityY+MetricM " to avoid duplicates?
I don't understand what you mean with +1 or 0 and NOT +1 and -1?
Thank you again for your help with this!
EDIT: I am not sure but I think that it is the best way to start with "Score" as input?
Like I said, I'm struggling with meaningful analysis for city pairs. One city at a time is easy
What I tried was ...
1. Comparing Karlshamn against Karlskrona for 20 metrics, if Karlshamn > Karlskrona for 7 and Karlshamn < Karlskrona for 12 and even for 1, then I scored the Karlshamn - Karlskrona pair as -5 ( i.e. 7 - 12 + 0)
I could just count the 'greaters than' and then Karlshamn would = 0 and Karlskrona = 12
2. I started with the 'Data' sheet and 'Score' was the last. I'd format 'Score' better to make it more understandable ("easy enough for a VP to read" :) )
3. I think I'll take a shot at just a single city comparison and see what it looks like
Hi Paul,
thank you for your reply. I am very grateful for your help on this.
It isn't an easy ask and I am also not sure on how to do a meaningfull analysis.
What makes the city pair analysis hard? I am not sure that I follow on the why.
The City Pair just feels like comparing apples and oranges
I don't see knowing that (Karlskrona - Olofström) is 'higher' than (Karlshamn - Sölvesborg) is useful, but then again it's not my call
City1 - City2 Summary #Metrics Karlskrona - Olofström 6 Ronneby - Sölvesborg 4 Olofström - Ronneby -2 Karlskrona - Sölvesborg -2 Karlshamn - Olofström -2 Olofström - Sölvesborg -4 Karlskrona - Ronneby -5 Karlshamn - Karlskrona -5 Karlshamn - Ronneby -8 Karlshamn - Sölvesborg -10
It's just that I'm used to (in my tiny world) more of a single parameter evaluation
This is just a simple one city at a time scoring matrix to rank each city (not each pair)
Attachment 24136
Hi Paul,
thank you for your reply! I will start with your city by city analysis and see what I get.
Maybe you are right about apples and oranges! A while back I posted a question about regressions, could regressions be used on the city pairs somehow?