PDA

View Full Version : SUMPRODUCT Step By Step - Part 3



Bob Phillips
05-12-2009, 03:49 PM
3 More On Conditions

3.1 Overview

In the previous chapters we have explored the basics of SUMPRODUCT, and learnt how to use it to count with any number of conditions, over single or multiple ranges.

It doesn’t take much effort to extend the counting function into summing, and that will be covered in this chapter, as well as an introduction into using the double unary, and to OR conditions.

In this chapter, all of the examples will be working off of the data in Figure 5 - Comms Data table.

http://img133.imageshack.us/img133/3339/fig31.png
Figure 5 - Comms Data table

3.2 Summing With Multiple Conditions

We will start with a simple multi-condition sum, to calculate the total number of sales in the first quarter of the year 2009. This is multi-condition as it requires testing the date to be between 1st Jan and 31st March.

As we saw in the previous chapter, it is straightforward to use SUMPRODUCT to count the number of entries between these two dates with
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))

To extend this into a sum formula, it is just necessary to add the range to be summed, the value range.
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))*(D2:D20)))

Breaking this formula down we have:

- ((B2:B20>=DATE(2009,1,1)) - this tests the date range to be greater than or equal to 1st Jan 2009

- ((B2:B20<DATE(2009,4,1)) - this tests the date range to be less than 1st Apr 2009

- Using the * operator on these two conditions creates an AND condition that gives all of the items within the required date range, the rows where the date is greater than or equal to 1st Jan 2009 AND less than 1st Apr 2009, or within the first quarter of 2009

- (D2:D20) – the range to be summed is presented as a simple range.

The * operator is used here not to force an AND condition, but to simply multiply the amounts by the results of the multiple conditional tests, which is then summed.

3.3 Using The Double Unary

We have previously used the * operator as a way of performing AND operations, condition 1 AND condition 2. It is not that * acts as an AND, it is because each conditional test returns an array of TRUE or FALSE values, and multiplying these TRUE/FALSE arrays gives a result array of 1s and 0s. Using the formula
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))

The first condition (B2:B20>=DATE(2009,1,1) returns an array
{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE ;TRUE;TRUE;TRUE;TRUE}

The second condition (B2:B20<DATE(2009,4,1) returns an array
{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FA LSE;FALSE;TRUE;TRUE;TRUE}

The * operator multiplies these two arrays together,
{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE ;TRUE;TRUE;TRUE;TRUE}*
{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FA LSE;FALSE;TRUE;TRUE;TRUE}

giving a resultant array of {0;1;1;1;1;1;1;1;1;0;1;1;1;1;0;0;1;1;1}. This array is the summed to give a result of 15.

If we extend this to including the values in column B, the formula
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))*(D2:D20))

gives a result of 1738, because the intermediate arrays are
{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE ;TRUE;TRUE;TRUE;TRUE}*
{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FA LSE;FALSE;TRUE;TRUE;TRUE}*
{700;75;125;500;100;75;250;88;25;18;22;333;33;20;15;40;17;25;50}

giving a resultant array of {0;75;125;500;100;75;250;88;25;0;22;333;33;20;0;0;17;25;50}.

This array is then summed to give a result of 1738.

In the last formula, the values are not involved in a conditional test, just multiplied as is.

The important thing to realize here is that * operator is utilised here to coerce the TRUE/FALSE values to 1 or 0, which is required so that the basic math operations, multiply and add (what SUMPRODUCT does), have something to work on.

In addition to the * operator, many other operators can be used. We will address most very briefly later, but there is one that you will come across a great deal if you search the internet for solutions, and that is the double unary, --.

The double unary does not directly replace the * operator, it has to be used differently. The above formula with the double unary would not be
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))--(B2:B20<DATE(2009,4,1))--(D2:D20))

but would be
=SUMPRODUCT(--(B2:B20>=DATE(2009,1,1)),--(B2:B20<DATE(2009,4,1)),D2:D20)

In the previous formula, the * operator was used to multiply each element of two arrays to get another array of 1s and 0s. In this formula, the double unary coerces each array of TRUE/FALSE values to 1 or 0 directly. SUMPRODUCT manages the multiplication of each (numeric) array and sum the results, this is SUMPRODUCT in its most basic form as described in section 1.1 SUMPRODUCT Basics.

The double unary is my preferred way of using SUMPRODUCT for multiple conditional tests because it is closer to basic SUMPRODUCT usage. Using the * operator means that the multiplication, or product, part of the formula is handled by the * operator, and SUMPRODUCT just does the summing.
[We could also use =SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1)),D2:D20) which still uses the * operator to force the TRUE/FALSE arrays to a final array of 1s and 0s, and SUMPRODUCT will multiply that resultant array by the value array and sum the results]

There are some instances where the * operator doesn’t work, and some where the double unary doesn’t work, these will be covered in detail later.

3.4 OR Conditions

You may well be aware that OR conditions are managed in an IF test by using the OR function, such as =IF(OR(J1>20,K1="X"),"Match","No match"). Unfortunately this construct does not work in multiple conditional tests in SUMPRODUCT.

But consider an alternative IF test. This formula =IF((J1>20)+(K1="X"),"Match","No match") produces the same result as the previous example, so let’s examine it some more.

Each of the two conditions, (J1>20) and (K1="X"), return TRUE or FALSE., so we are in familiar territory already. This formula works because of the way that Excel adds TRUE/FALSE values. Table 1 - OR Table shows the results of each combination of TRUE and FALSE.

TRUE TRUE 2
TRUE FALSE 1
FALSE TRUE 1
FALSE FALSE 0
Table 1 - OR Table

Each result except FALSE/FALSE is non-zero, and Excel will treat any non-zero value as TRUE, zero as FALSE. In other words, by adding the conditions, if either condition is TRUE the result is TRUE.

3.4.1 Counting OR Conditions

We can use this approach within SUMPRODUCT as well, by adding the conditions using the + operator. As an example, to calculate the total number of sales of ADSL Routers or sales to Bell Labs we can use
=SUMPRODUCT((C2:C20="ADSL Routers")+(A2:A20="Bell Labs"))

which gives the result 8.

Breaking this down we can see that (C2:C20="ADSL Routers") returns the array
{FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE ;FALSE;TRUE;FALSE;TRUE;FALSE;FALSE}

and (A2:A20="Bell Labs") returns the array
{TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRU E;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE}

Adding these arrays gives a resultant array {1;1;0;0;0;1;0;0;1;0;0;0;1;0;1;0;1;1;0} using the table rules shown above.

Summing this resultant array gives the final result of 8.

It should be clear that the + operator is acting in a similar way to the * operator, that is it is explicitly performing the math on the conditions, leaving SUMPRODUCT to simply sum the resultant array.

Unfortunately, there is a twist in this little technique. In our previous example we showed how to count the sales to Bell Labs OR for ADSL Routers. If we repeat this for Verizon instead of Bell Labs using
=SUMPRODUCT(((A2:A20="Verizon")+(C2:C20="ADSL Routers")))

Looking at the data we can see there are 7 items that match these criteria, but this formula returns 9. So what has gone wrong? If you look carefully at the data you can see that rows 3 and 16 are for Verizon and are ADSL Routers, so these items are being double-counted (look back at Table 1 - OR Table, you can see that TRUE+TRUE returns 2). Fortunately this is easily remedied, we can just test the OR conditions for > 0, because the results of meeting either or both of the conditions are both > 0 whereas neither conditions met is = 0. Testing for > 0 gives a TRUE/FALSE value, so we have to again coerce this to 1s and 0s. The net result of this is the following formula

=SUMPRODUCT(--((A2:A20="Verizon")+(C2:C20="ADSL Routers")>0))
which returns the correct result of 7.

3.4.2 Summing OR Conditions – OR and AND

Calculating the quantity with an OR condition is possible just as it is with the AND condition, by combining OR and AND operations. A little care has to be exercised to ensure that the correct operator precedence is followed. If we were to just add the quantity using a * operator
=SUMPRODUCT((A2:A20="Bell Labs")+(C2:C20="ADSL Routers")*(D2:D20))

returns the result of 210, which is clearly too few. If we look at the final resultant array, we can see that it returns {1;75;0;0;0;75;0;0;25;0;0;0;1;0;15;0;17;1;0} which sums to 210. We can see here that each first conditional tests that are TRUE returns a value of 1 into the resultant array, not the quantity. This is because the * operator is processed before the + operator, so this part of the formula (C2:C20="ADSL Routers")*(D2:D20) is evaluated first, and the other condition is ORed with this resultant array, rather than the first two conditions being ORed before that resultant array is multiplied by the value array.

This is easily rectified with an extra set of brackets
=SUMPRODUCT(((A2:A20="Bell Labs")+(C2:C20="ADSL Routers"))*(D2:D20))

which returns a final resultant array of {700;75;0;0;0;75;0;0;25;0;0;0;33;0;15;0;17;25;0}, which gives the correct result
of 965. This works because brackets are evaluated before the * operator.

It should be clear that this version of the formula
=SUMPRODUCT(((A2:A20="Bell Labs")+(C2:C20="ADSL Routers")),D2:D20)

also works.
[We could include the double unary before the ORed tests, =SUMPRODUCT(--((C2:C20="ADSL Routers")+(A2:A20="Bell Labs")),D2:D20), but this would be superfluous as the + operator explicitly coerces the conditions into 1/0 values]

3.4.3 NOT OR Conditions

Whereas AND conditions easily accommodate not equal values
=SUMPRODUCT(--(A2:A20="Verizon"),--(C2:C20<>"ADSL Routers"))

it is not so simple for OR conditions. This formula
=SUMPRODUCT(((A2:A20="Verizon")+(C2:C20<>"ADSL Routers")))

returns 18 not 2 as it would appear it should. The reason for this will be covered in the next step.

3.5 Summary

Now we have learnt how to use SUMPRODUCT to sum items that match multiple conditions as well as counting those multiple conditions, and we have also covered the use of the double unary as against the * operator.

Beyond that, we have extended our capabilities by using SUMPRODUCT to cater for OR tests as well as AND tests.

Next time we will look at including extra functions within the test, beyond simple equals, not equals, etc.

Juba
05-19-2009, 04:43 AM
Xld can u give us the dataset to try the example on it

Bob Phillips
05-19-2009, 05:14 AM
Sure.

Juba
05-19-2009, 05:19 AM
Thank you xld
i am waiting your lesson about SumProduct
it is very great step here in this forum to start this kind of lesson
i wish a good luck for u xld

mehdoush
07-05-2009, 07:59 AM
thnks so much for this tutorial , very interesting... carry on

Anomandaris
10-19-2009, 06:37 AM
this is great stuff xld!
thanks a lot

nst1107
11-21-2009, 10:27 AM
Excellent teaching, Bob. I'm beginning to see why you like SUMPRODUCT so much.

reza_doang
05-24-2010, 09:00 PM
XLD...

Up until know i don't really know about sumproduct, now after read your post i know better about sumproduct.
keep writing and share your knowledge.

thank

Opv
06-06-2012, 09:59 AM
Great information on Sumproduct. Thanks for the posts. I am befuddled by one aspect of Sumproduct, that being whether the {} are really needed. I've seen some pages that stress that they need to be included and others that state specifically that they do not. My Sumproduct formulas seem to work fine regardless of whether they are included, so I tend to omit them. But, if there is some technical reason they should be there I'd like to know why.

Opv
06-08-2012, 09:53 AM
Great information on Sumproduct. Thanks for the posts. I am befuddled by one aspect of Sumproduct, that being whether the {} are really needed. I've seen some pages that stress that they need to be included and others that state specifically that they do not. My Sumproduct formulas seem to work fine regardless of whether they are included, so I tend to omit them. But, if there is some technical reason they should be there I'd like to know why.

Disregard. It appears I scanned through the posts a bit too quickly and overlooked the answer to this question.

Bob Phillips
06-13-2012, 04:10 AM
Disregard. It appears I scanned through the posts a bit too quickly and overlooked the answer to this question.

What was the answer as you read it? Just want to make sure that you do understand as there are very few examples where you need the {...} (and if you do, there really is no need for SP :))

Opv
06-13-2012, 08:20 AM
What was the answer as you read it? Just want to make sure that you do understand as there are very few examples where you need the {...} (and if you do, there really is no need for SP :))
Thanks for keeping me humble. :) I didn't mean to imply that I fully "understand," per se. I only meant that I had apparently overlooked what I presumed to be the answer as to whether wrapping a SP equation in {} is required. My presumption that the {} are not required was based on the following statement from Part 2:


In addition, whilst SUMPRODUCT is not array entered, it is an array function in this conditional usage, which means that you cannot use whole columns in the ranges to be tested, you have to use an explicitly limited range. This is where COUNTIF and SUMIF are preferable.
If there are indeed examples of SP in which the {} are needed, then I inappropriately generalized the above statement as being applicable in all uses of SP rather than just in the example where the statement was made. What would be an example where the {} are needed/required?

Bob Phillips
06-13-2012, 01:28 PM
Sorry, I was not meaning it in that sense at all :( ; rather I was keen to see that if you were taking this much trouble to understand that you weren't getting some mis-information somewhere.

As I have said, and as you quoted, whilst SUMPRODUCT is not array entered, it is an array function in its conditional usage. This means that if does not need to be explicitly told to array process the constituent parts, which is one Ctrl-Shift-Enter does, as it implicitly 'knows' this.

However, if there is some function embedded within that does not implicitly process an array, it is that function that needs the explicit CSE, not SUMPRODUCT, although of course it is the whole formula that gets encapsulated thus.

So whilst this simple SP is not array entered


=SUMPRODUCT(--(A2:A200=MAX(A2:A200)))

this one, which achieves the same, does need to be array-entered


=SUMPRODUCT(--(IF(A2:A200=MAX(A2:A200),1))

purely because it has the non-array IF function within it.

But ... again as I said, if you have a SP formula that needs array-entering, you don't need SP, you can use an array-entered SUM alone


=SUM(--(IF(A2:A200=MAX(A2:A200),1)))

Opv
06-13-2012, 01:54 PM
Sorry, I was not meaning it in that sense at all :( ; rather I was keen to see that if you were taking this much trouble to understand that you weren't getting some mis-information somewhere.

Oh, OK. I have indeed been trying to wrap my brain around just how flexible and expandable Sumproduct is just for the sake of knowing. After reading your explanation, I can see why I got confused by what I read on some of the other web sites I've perused on the subject. I was lacking the proper context in which to interpret the specific examples they were addressing. Even those sites make more sense now.

Thanks!

rajiv_h
06-17-2020, 04:25 PM
It has been a long time since the article was published. The Figure 5 link no longer works. It's now linked to the Imageshack login page.

On my Huawei cellphone, there is a broken image link icon between the lines "In this chapter, all of the examples will be working off of the data in Figure 5 - Comms Data table." and "Figure 5 - Comms Data table". However, on both Edge Version 83.0.478.50 and Firefox 77.0.1, both 64-bit and under Windows 10, there is absolutely nothing between the lines. One cannot tell there is a link there.