How to analyze big data with Excel

Printable View

Show 40 post(s) from this thread on one page

04-10-2019, 08:29 AM
waimea

How to analyze big data with Excel

I have a lot of data but I am not sure how to analyze it with Excel.

Should I use regressions? Pivot tables? Something else?

What is the first step in analyzing big data??
04-10-2019, 08:37 AM
Bob Phillips

Power Query and Power Pivot and pivot tables.
04-10-2019, 08:43 AM
waimea

Hi xld,

thank you for your answer!

I have tabular data and my first data table has 1600 rows and 100+ columns.

In what way should I try to transform my data before analysis? Should it be tabular or non-tabular?

I am trying to compare cities against each other and find out what city-city combinations is the best.
04-10-2019, 09:37 AM
Bob Phillips

That amount of data is trivial for these tools.

You should look to denormalise it so that you can pivot the data and get the sort of analysis you require. Power Query can import that data as it is and unpivot it so that it can then be pivoted.
04-10-2019, 09:47 AM
waimea

Hi again xld,

thank you for your reply!

I have unpivoted my columns using PowerQuery and I now have 3 columns, city name, metric and value.

How should I proceed?
04-10-2019, 10:20 AM
Bob Phillips

On the Home tab, click the arrow on the Close & Load button (Furthest left) and select Close & Load To. In the popup dialog, select connection only and Add to Datamodel. Exit back to Excel.

Now in Excel, go to Insert>Pivot Tables, select 'Use this workbook's Data Model' from the popup dialog, you should see your query there which you can now use in a pivot table.
04-10-2019, 10:35 AM
waimea

Hi xld,

thank you for your instructions, I now have a data model with my data in it but when I click on metric in my field list I get a list of all metrics.

Is it possible to single out them one by one in the way my data started out like?
04-10-2019, 10:43 AM
Bob Phillips

Put the metric in the rows pane not the value pane.
04-10-2019, 11:51 AM
waimea

Hi xld,

I have followed your instructions but I am not sure on how the unpivoted data when pivoted in a pivottable is better?

After I have my datamodel should I create measures to analyze the data in PowerPivot? Or try to use regression on the data?
04-10-2019, 01:28 PM
Bob Phillips

What you have is a set of raw data that you can now analyse. How you analyse it I cannot say, I do not know the data or what you are trying to achieve.
04-10-2019, 01:45 PM
waimea

Hi xld,

thank you for your reply. I don't know how I should analyze my data.

I am comparing cities against each other on a lot of different metrics, more is better and there is a min and a max score.

Say I have cities A, B and C.

Then I am comparing City A vs City A. City A vs City B. City A vs City C.

City B vs City A, City B vs City B. City B vs City C.

City C vs City A. City C vs City B. City C vs City C.

I want to find the combination of highest score between all combinations of the different cities.
04-13-2019, 12:51 AM
waimea

Any suggestions on how I can analyze data that is as in the above post??
04-13-2019, 07:30 AM
snb

Quote:

Originally Posted by waimea

Any suggestions on how I can analyze data that is as in the above post??

You can't since these are only strings.
Please post a sample Excel workbook.
04-13-2019, 04:05 PM
Paul_Hossler

Quote:

Originally Posted by waimea

Any suggestions on how I can analyze data that is as in the above post??

Why compare a city against itself?

If you have 10 cities (C01 - C10) and 15 metrics (M01 - M15) to compare, then I'd figure you'll have 15 x (10 x 9) / 2 = 675 'compares' comparing each metric for each city pair

e.g.

C01 - C02 - M01, C01 - C02 - M02, … C01 - C02 - M15
C01 - C03 - M01, C01 - C03 - M02, … C01 - C03 - M15

….

C09 - C10 - M01, C09 - C10 - M02, … C09 - C10 - M15

Sounds like you started with a 10 row, 16 column array of data using my made up numbers above ??
04-14-2019, 01:25 AM
waimea

Hi Paul,

thank you for your reply! I am comparing a city against itself only because I loop trough "all items" in two comboboxes.

There is no value in comparing city A with city A.

I am not sure what I am starting with at the moment. I will have to look at my file again.

I will post a sample workbook. Do you have any suggestions on how I should analyze the "compares"?
04-14-2019, 07:54 AM
Paul_Hossler

2 Attachment(s)

As an CONCEPTUAL EXAMPLE .... :yes … you could look at something like this

10 cities (GREEN), 3 metrics (YELLOW)with each metric having a weighting factor (ORANGE)

The BLUE are just random numbers 0-100 but could be your raw scoring of City-X against City-Y for Metric-M (Col A - Col K)

Each raw score is weighted (Col M - Col N) in Col P - Col W

Weighted scores are summed Col Y - Col Z

The weighted summary is sorted High to Low in Col AB - Col AC

I used 2 UDFs and array functions

F9 to recalc and see what happens

Attachment 24062
04-14-2019, 08:18 AM
waimea

Hi Paul,

thank you for your reply! You are a conceptual example genius!

I am looking at your workbook right now and I am trying to understand it!

How did you determine the weights?
04-14-2019, 12:09 PM
Paul_Hossler

Quote:

Originally Posted by waimea

Hi Paul,

thank you for your reply! You are a conceptual example genius!

I am looking at your workbook right now and I am trying to understand it!

How did you determine the weights?

I just put in anything for the weights - in real life, you'd have to figure out what each metric was 'worth' in your analysis

This example determined which City Pair was 'best', but it's probably possible to figure out which City was highest scored
04-14-2019, 11:59 PM
waimea

Hi Paul,

thank you for your reply.

How would you go about finding the city/cities with the highest score?
04-15-2019, 07:46 AM
Paul_Hossler

2 Attachment(s)

The 'best' City Pair is in AB2 in my example above. The City01-City02 has a score of 70 for Metric-01

You could give City01 a 70 and City02 a 70 for Metric-01, and then weight Metrics, and, etc.

Attachment 24066

This makes City-04 the 'best' with a composite score of 530.95

It really all depends on how you weight the metrics and how you choose to determine the 'best'
04-15-2019, 09:42 AM
waimea

Hi Paul,

thank you for your reply! I really like your idea with a comsposite score.

I will download your workbook later today and see if I can recreate it.

Thank you again!
04-18-2019, 09:04 AM
waimea

1 Attachment(s)

Hi again Paul,

I have now had a good look at your two conceptual examples and both are breathtakingly good! I am looking at your UDF right now.

However, my data output from my macro is not in the same form. Is there an easy way to change my output macro to suit your conceptual examples?

I have condensed my data to only have 3 metrics.

1. Make my data the same as in your example
2. Make the code aware of how many cities there are

eg. my first batch of cities has 40 cities, the next batch has 59, the next 26 etc.. down to a mere 1 city.

All suggestions are welcome.
04-18-2019, 12:31 PM
Paul_Hossler

Sure

However, you would not want to include city pairs (CityX CityY) where (a) X = Y and where (b) X > Y

(A) X = Y would double count (e.g. City01 City02), and

(b) X > Y would have been counted earlier where X < Y (i.e. City02 City 04 counted, but City04 City 02 would be counted again)

Weighting factors?

Did you want to score city pairs or each city by itself?
04-18-2019, 12:57 PM
waimea

Hi Paul,

thank you for your reply! You are too clever for me and this is above and beyond what I could ask for!

I want to score each city for itself. Since the combination of city pairs only is a comparison and I want to find the cities with the highest composite score.

I am not sure about Weighting factors, I was planning to use it as you did in your conceptual example with one weight for each metric.
04-18-2019, 01:54 PM
Paul_Hossler

Q: what does the originalest data look like? You said output of your macro, implying that it's processed or reformated somehow

Q: I assume the cities have real names and not -nn numbers ("New York City", Los Angeles", "London", ...)

Q: I assume the metrics have real names ("Size", "Quality of Life", Good Fishing", ...)

Your basic premise (as I understand it) is to (A) compare two cities and 'grade' them against each other using various metrics (e.g. "LA vs NYC on size is 66")

It is NOT to (B) just 'grade' each city on the metrics
(e.g. "LA on size is 66" and "NYC on size is 75")

I've never done it the A-way before, just the B-way
04-18-2019, 02:14 PM
waimea

Hi Paul,

thank you again for your reply! It is getting late here and I will make a sample workbook tomorrow.

The cities have real names, the metrics have real names and your premise is correct (I think).

Do you see any advantages or disadvantages of the A-way?
04-18-2019, 03:53 PM
Paul_Hossler

Quote:

Do you see any advantages or disadvantages of the A-way?

A way: I just think it'll be harder for you to score the city pair comparisons

Example: say that the Metric = "Quality of Life"

LA - NYC = 56

LA - London = 67

London - Sydney = 78

The raw data using the "Each City" approach in post #20 would have

LA = 123

London = 145

etc.

B way: I just think it'll be easier for you to score each city for each metric

Example: say that the Metric = "Quality of Life" and the score for individual cities are

LA = 56
NYC = 67
London = 78
Sydney = 89

The sum by city and weighted would give you the composite score

Doesn't have to be 0 - 100, I've seen just five values used -2, -1, 0, 1, 2
04-19-2019, 12:34 PM
waimea

Hi Paul,

thank you for your reply! I have read all of your posts again and you are spot on!

I don´t want to double count and I don´t want to use the A way.

I have condensed my metrics down to two metrics.

Metric A is the score of city vs city. Metric B is the score of city vs state. I am planning to weight my metrics 50% and 50%. So city01 vs city02 in metric A could be 55 and metric B could be 77. Metric A is a combination of other metrics and a score of 55 means that there are 55 comparisons where metric A is above the City 2 value and 77 is also a combination of metrics where metric B is above or equal to the state mean.

Ex. City01 has a value of size that is greater then City02 and also greater then the state mean. The state mean is calculated from all cities in that state.

So what I am trying to do now is to get my output (in the attachment above) to be as in your example.

But I am not sure how?
04-20-2019, 05:23 AM
Paul_Hossler

Quote:

So city01 vs city02 in metric A could be 55 and metric B could be 77. Metric A is a combination of other metrics and a score of 55 means that there are 55 comparisons where metric A is above the City 2 value and 77 is also a combination of metrics where metric B is above or equal to the state mean.

Been thinking on this

Metric 1 - So if

1 vs 2 = 55
1 vs 3 = 45
1 vs 4 = 65

2 vs 3 = 70
2 vs 4 = 60

3 vs 4 = 30

how would you generate a single rank-able score for

1 = ?
2 = ?
3 = ?
4 = ?

Ranking a PAIR of cities is easy: 2-3, 1-4, 2-4, 1-2, 1-3,3-4

But I don't know what that tells you

Metric 2 is much more straight foward

1 = 35 above state mean
2 = 15 above state mean
3 = 45 above state mean
4 = 20 above state mean

Rankings are City 3, 1,4, 2
04-20-2019, 06:43 AM
waimea

Hi Paul,

thank you for your reply! I am looking at sports rankings where they count wins, draws or losses.

The best city would be the city with the most wins?

I am working on a sample workbook with my macro that creates the output, I will post it as soon as I get it more complete.
04-20-2019, 08:21 AM
waimea

1 Attachment(s)

Here is a sample workbook which represents my actual workbook pretty good.

I am using a macro to loop through two comboboxes and store the value of metric A and metric B in a new worksheet called Score.

Is is the table in Score that I want to analyze and find the best combination of cities.
04-21-2019, 01:14 AM
waimea

Any suggestions on how I can analyze this data?

The sample workbook uses the state with the least number of cities.
04-22-2019, 06:17 AM
Paul_Hossler

Quote:

Originally Posted by waimea

Any suggestions on how I can analyze this data?

The sample workbook uses the state with the least number of cities.

Didn't forget -- got busy this weekend

Trying a few approaches
04-23-2019, 07:26 AM
Paul_Hossler

2 Attachment(s)

OK, see if this floats your boat. I didn't want to go too far in case it wasn't what you were looking for

For the first composite metric

I compared the score for each metric for each combanation of cities, and if CityX+MetricM minus CityY+MetricM was positive then add 1 to CityX/CityY score, if negative subtract 1

For the second composite metric

I compared the score for each metric for each combanation of cities, and if CityX+MetricM minus Average for MetricM was positive then add 1 to CityX score, if negative subtract 1

Attachment 24120

The worksheet "Dump" is not really needed - I just used it to display intermediate array results

The worksheet "Score" is not automatically created yet
04-23-2019, 02:02 PM
waimea

Hi Paul,

thank you for your reply! It is getting late here and I am not sure that I follow what you are doing.

I will take a fresh look tomorrow at your workbook. I think that you float my boat! Thank you again for your response!
04-24-2019, 03:20 AM
waimea

Hi again Paul,

I have now had a long look on your workbook and I don't understand all of your code in mod_PH1.

However, I think that I understand your output.

You are taking "CityX+MetricM minus CityY+MetricM " to avoid duplicates?

I don't understand what you mean with +1 or 0 and NOT +1 and -1?

Thank you again for your help with this!

EDIT: I am not sure but I think that it is the best way to start with "Score" as input?
04-24-2019, 06:39 AM
Paul_Hossler

Like I said, I'm struggling with meaningful analysis for city pairs. One city at a time is easy

What I tried was ...

1. Comparing Karlshamn against Karlskrona for 20 metrics, if Karlshamn > Karlskrona for 7 and Karlshamn < Karlskrona for 12 and even for 1, then I scored the Karlshamn - Karlskrona pair as -5 ( i.e. 7 - 12 + 0)

I could just count the 'greaters than' and then Karlshamn would = 0 and Karlskrona = 12

2. I started with the 'Data' sheet and 'Score' was the last. I'd format 'Score' better to make it more understandable ("easy enough for a VP to read" :) )

3. I think I'll take a shot at just a single city comparison and see what it looks like
04-24-2019, 07:44 AM
waimea

Hi Paul,

thank you for your reply. I am very grateful for your help on this.

It isn't an easy ask and I am also not sure on how to do a meaningfull analysis.

What makes the city pair analysis hard? I am not sure that I follow on the why.

2 Attachment(s)

The City Pair just feels like comparing apples and oranges

I don't see knowing that (Karlskrona - Olofström) is 'higher' than (Karlshamn - Sölvesborg) is useful, but then again it's not my call

City1 - City2 Summary #Metrics

Karlskrona - Olofström 6

Ronneby - Sölvesborg 4

Olofström - Ronneby -2

Karlskrona - Sölvesborg -2

Karlshamn - Olofström -2

Olofström - Sölvesborg -4

Karlskrona - Ronneby -5

Karlshamn - Karlskrona -5

Karlshamn - Ronneby -8

Karlshamn - Sölvesborg -10

It's just that I'm used to (in my tiny world) more of a single parameter evaluation

This is just a simple one city at a time scoring matrix to rank each city (not each pair)

Attachment 24136

04-24-2019, 08:42 AM
waimea

Hi Paul,

thank you for your reply! I will start with your city by city analysis and see what I get.

Maybe you are right about apples and oranges! A while back I posted a question about regressions, could regressions be used on the city pairs somehow?

Show 40 post(s) from this thread on one page

All times are GMT -7. The time now is 02:54 PM.

City1 - City2 Summary	#Metrics
Karlskrona - Olofström	6
Ronneby - Sölvesborg	4
Olofström - Ronneby	-2
Karlskrona - Sölvesborg	-2
Karlshamn - Olofström	-2
Olofström - Sölvesborg	-4
Karlskrona - Ronneby	-5
Karlshamn - Karlskrona	-5
Karlshamn - Ronneby	-8
Karlshamn - Sölvesborg	-10