View Full Version : Solved: Do not know how to approach this one?

10-04-2008, 11:07 AM
I have a database with 3200 rows of location data that I need to categorize into three categories within each territory based on sales. Seems simple, until.....

The problem is I need to categorize by sales and number of sites within the territory. Try to explain,

A territory could have between 50 and 150 accounts.
I need to find the top %25 of sites within each territory based on sales and insert an "A" beside them.

Clear as mud right:giggle

I dont even know where to start on this one. Maybe its way to complex but I have been bouncing this around for a few months.:dunno :dunno

This will be a challenge I think.:think:

Anyone up for a challenge? I really dont like to ask someone to do the work for me, but this is way over my capability!

Thanks again to all....

10-04-2008, 01:36 PM
This is the best I could come up with

IF(D10<=SMALL(IF($C$10:$C$587=C10,$D$10:$D$587,99^99),ROUND(COUNTIF($C$10:$C$587,C 10)/4,0)),"C","")))

this is an array formula

10-05-2008, 04:17 AM
Wow XLD.:bug:
I need to get a book just on array formulas and advanced functions. I use many of them but never as many nested as that!

It works great for the top 25% and bottom 25% but the middle 50% it seems that half of them are blank.

I had an idea on the way to the office this morning (I know it's Sunday!:whip ) that might be a better way to do this.

What if I have the report from the server sort the data by terr number descending and then by sales descending to make the data consistent enough to run code against it? :think: I included the updated sheet.

Could the code then start at the first terr number and count down how many locations there are until the terr number changes and then assign the value to the cells to the right and do it all over again for the next terr?

I will try to butcher some code up for that but it will be frightening.:rotlaugh:

Again, Thank you for your help and congrats to the 1,000 post milestone. You have undoubtedly made countless excel users maintain their sanity over the years.:friends: :beerchug:

10-05-2008, 05:10 AM
Wow XLD.:bug:
I need to get a book just on array formulas and advanced functions. I use many of them but never as many nested as that!

Again, Thank you for your help and congrats to the 1,000 post milestone. You have undoubtedly made countless excel users maintain their sanity over the years.:friends: :beerchug:

I will answer the 'real' question in a while, but if I may I will ruminate on the help that we provide, the whys and the wherefores.

There are a number of people like myself who spend a lot of time helping out on forums, plenty here, even more when you go across to other sites, and it is interesting to think why they do it.

When I started answering questions, it was nice boost for ones self-esteem to think that you know the answer, and a good feeling that you help someone out. But that can only go so far, especially if you are answering on some forums where it can feel like a race as to who can answer quickest. So why continue helping?

I often wonder about that, my wife certainly does, and is often heard voicing her incredulity. I think you hit the nail on the head, for me at least, when you said I help others maintain their sanity. By helping, I help maintian MY sanity too. I love Excel, as must be obvious (actually, I love VB most, and that is what I do mainly which may surprise some as I am the loudest advocate of not using VBA to do what Excel can do on VBAX), and I do a lot of Excel in my work, and I find that breaking out of the work thing and answering questions helps me get a brain de-fuzz; helps me get a needed break; helps me to assist others; and so on. So all in all, it is good for me. How about that - a true win-win situation.

I like VBAX above all other forums because it is not as competitive as many, you get a chance to develop solutions, and there are loads of good people here. I have made 'virtual' friends with lots of people here , Steve Lucas, Joseph (Malik), Simon Lloyd, amongst the admins, people like Doug (YellowLabPro), MarshyBid, Sir BabyDum, and so on amongst the posters (apologies to the many I have omitted). In a world where computers and t'Internet are breaking down social norms, we are able to establish some sort of socialising (much more real than the artificialities of Facebook and the likes, because we can be relatively assured that what we get is real). I have even got to meet some, Ken Puls and Zack Barresse, and they are just as nice people in real life as they seem on the board. Simon Lloyd even invited me to his wedding. How good is that?

So, whilst it is nice for people to voice their appreciation, it is not necessary, it is a totally selfish thing I (we?) do :wink:

10-05-2008, 05:19 AM
It works great for the top 25% and bottom 25% but the middle 50% it seems that half of them are blank.

Can you give me an example of the problem, I can't see what is wrong.

Could the code then start at the first terr number and count down how many locations there are until the terr number changes and then assign the value to the cells to the right and do it all over again for the next terr?

This should work, and is a better way IMO if you can get sorted data (or even sort it yourself). You will need to count forward to see however many there are in a category, and then find the 25%, 50% and 75% thresholds for that category, which is what my formula also does.

10-05-2008, 09:17 AM
I wanted to write a long story on how much I appreciate you guys, like the people you mentioned in #4, as well as countless others in different forums. I have lots of ideas but can never finish them so I come to these forums and you and the others never failed me. But I will keep it short, it is an excel/vba site after all, so all I will say is thanks a lot you (too few) ladies and gentlemen.



10-05-2008, 05:15 PM

Can you give me an example of the problem, I can't see what is wrong.

I attached an updated sheet in my last post with the formula you posted which shows the result. It seems like half the "B" accounts do not get picked up in the calculation somehow.:dunno I still can not follow that formula.

Time to sign off for the night and get back to work in the morning.
Thanks again.....


10-05-2008, 11:23 PM
Evening there Slang.

The Formula isn't really as complicated as it looks. You just have to take and break the formula down into pieces and manage it once at a time. If you're into code-programming rather than using Formulas like this, think of it as an If...ElseIf...Else statement.

You'll break this down into three primary parts of the Formula based on the final result: a, b, or c.

Part 01:

You'll start from the inner-most set of parenthesis and work your way out.

CountIf($C$10:$C$587, C10)/4
This Function is counting the number of cells in Range C10:C587 where the cell value equals C10, in this case a value of 1. This results in the total number of cells for Territory 1. In your example, the result is 11. It then divides this number by 4 so that you know how many cells will be in each "quarter".

Round(11, 0)
This rounds your resulting value to the nearest Integer. The CountIf divided by 4 actually returns 2.75. But can you have 2.75 cells worth of information? Nope, so you'll want to round up to 3.

If($C$10:$C$587=C10,$D$10:$D$587,0)This is where the Formula gets its kick and saves a very large amount of work on your part. Formula Arrays are wonderful things, but yes, they can get very complicated. So I'll explain them briefly here.

Formula Arrays:
A Formula Array is the formula equivalent of a For Each...Next statement. Excel is doing all of the work for you. One cell at a time, it will evaluate the If statement and the result will give you an array.

Below is a roughly equivalent snippet. This gives you an idea of the process Excel would go through to calculate the current Formula Array.

For Each objCell in Range("$C$10:$C$587").Cells
If objCell.Value = .Range("$C$10").Value Then
Return objCell.Offset(0, 1).Value
Return 0
Next objCell
Next objCell

In this example, it is evaluating each cell in your Range C10:C587. If the value is equal to your current Territory ID, it will return the dollar amount of the cell in Column D on the same Row. Otherwise, it returns a value of 0.

The actual array it returns will look something like:
{256009, 243861, 201332, 166546, ...}

Large(FormulaArray, 3)
The Large Function will return the k largest element in the given array. The second argument, or the '3' is the result of our Round Function above. We are literally telling Excel to tell us what "the 3rd largest element is in the array." The result is the lowest Sale Amount in order for this location to qualify as the Top 25% in Territory 1: "$201,332".

If(D10>=201332,"A", ...)
This is fairly self explanatory, but if the Sales for the current Territory is greater than or equal to our lowest Sale Value to qualify as the Top 25%, we will insert the value "A" into the current cell.

The elipses (...) in the previous section of formula is what Excel will do if the current cell is not in the Top 25%. In this example, it is another nested If Formula almost identical to the one just explained.

Part 02:

This is the next step that Excel will process and is only evaluated if the the first criterion fails. If the current Location is not in the Top 25%, we will calculate this section.

Probably 90% of this section is identical to Part 01. We have to modify one section, though, because we are trying to calculate whether the current location's sales are in the Middle 50%. You can do this two different ways:

Method 01:
If the current sale is greater than the Top 25% sale amount AND is less than the Bottom 25% sale amount, we identify it as Category B.

Method 02:
If the current sale is less than the Bottom 25% sale amount, we identify it as Category B.

At first Method 02 sounds questionable. "But if we take everything not in the Bottom 25%, won't we still get the values in the Top 25% as well." The answer is no. Remember, Part 01 already set the values for anything in the Top 25%, so all of these cells are essentially ignored. Method 02 becomes the simpler choice.

Now, if you refer back to Part 01, you'll remember that the Round Function resulted in how many items are in each quarter of the sales. The Top 25% has 3 items. The 26%-50% section has 3 items. The 51%-75% has 3 items, and the 76%-100% has 2 items (since we only have 11 total items).

To get the limiting value for the Top 75% of sales, we'll use the 9th Largest value in our list. (0-25%: 3, 25-50%: 3, 51-75%: 3) Mathematically, that is the result of your Round Function multiplied by 3. This is where XLDs formula is not quite accurate.


If you break just this section of code down, you are counting the total number of items in Territory 1: 11. Then divide this number by 2: 5.5. Now, round it to the nearest Integer: 6. This results in the Top 50% of the Sales.

Since you want the Top 75%, take your original Round Function which results in 3 and multiply it by 3. You'll be replacing:

Part 03:
IF(D18<=SMALL(IF($C$10:$C$587=C10,$D$10:$D$587,99^99),ROUND(COUNTIF($C$10:$C$587,C 10)/4,0)),"C","")))

This section is our "catch all" section. If you'll look back to your formula, we've broken it down into three parts. Basically, it reads:

By this point, the only values we have left without a Category value are those which are not in the Top 25% or the Middle 50%. Thus, anything in the Bottom 25%.

In XLD's formula, we run an additional check to ensure that the remaining values are all below the cut-off line for the Bottom 25% mark. Technically, this isn't required since we have all of the values for Category A and Category B marked.

Change this section of the formula to simply:

Final Formula for Cell C10:

Make sure you save the Formula as a Formula Array by pressing Shift+Enter after pasting the data in the cell. Now fill Cell C10 all the way down to the bottom of your list and voila, all of your values now have an appropriate value in your Category column.

I hope this helps give you a much better understanding of Formula Arrays and gets you on track to finishing the project. If you need any further explanation, just let us know.

Good Luck.

10-06-2008, 12:25 AM
It works great for the top 25% and bottom 25% but the middle 50% it seems that half of them are blank.


just realised that I mis-read the spec.

See if this version does it how you want

IF(D10<=SMALL(IF($C$10:$C$587=C10,$D$10:$D$587,99^99),ROUND(COUNTIF($C$10:$C$587,C 10)/4,0)),"C","B"))

10-06-2008, 02:22 AM
test :)

10-06-2008, 03:59 AM
:cloud9: Thanks a lot XLD for the formula and Demosthine for the explanation of the anatomy of it. I am continually amazed on what Excel can actually do when using its capabilities fully.

Is there are really good book that covers the more advanced functions with examples out there somewhere that you would suggest? Something like a reference text that one could look up functions quickly?:think:

My only question is will it be able to handle 4,000 rows of data with all those array formulas? :eek:
I will try it this morning.

Thanks again "Lords of the Excel Realm":bow:


10-06-2008, 07:42 AM

I don'[t think there is a good book out there personally, the existing books teach you the fundamentals but fail to show how to take it forward. A friend and I did think of writing such a book, but getting a publisher interested has provemn very difficult.

4,000 rows of this formula took nearly 28 secs to recalculate on my machine, which is a long time. If this is a one-off, a calculation that only happens when the data is first populated, and you can be sure it won't be triggered again, that might be acceptable, but if not, your idea of sorting and calculating using VBA would be a better option.

10-06-2008, 04:28 PM
Hey Slang.

I would definitely agree with XLD that all of the books I have found out there are focus primarily towards the novice to intermediate users. As I've gotten more advanced, I often find myself just going through the Object Browser built into the Visual Basic Environment or going to Microsoft's Developer Network. The MSDN articles are quite often so filled with technical jargon that it can take hours to find the information you want.

Like XLD, I've often thought of writing my own book, especially lately with some of the much more advanced work that Mark and I have been doing. I never went as far as looking for a publisher because I can do most of it myself. I tend to find that an electronic edition of the books has become much more of a convenience issue. While I'm at work, we don't have any access to the Internet and it gets old carrying five-pound books around every day. If I get to writing one, that's my method of deliver. An eBook.

That being said, one very slight issue came to mind today regarding XLD's example versus my example. Neither answer is necessarily the "correct" method, but they do provide two slightly different results.

In my method, as I explained, we are working from the top down. In the example where we had 11 items, the result was 0-25%, 26-50%, and 51-75% all had three items each. The 76-100% area always had whatever was left over. In this case, it means that the Bottom 25% had 2 items, not three.

In XLDs example, he takes the 0-25% values first and then the 76-100% values second. Each of these had 3 items. This resulted in the middle region having the "remaining" items.

Data Scott XLD
Top 25% 3 3
Middle 50% 6 5
Bottom 25% 2 3

As this may affect your rankings (and potentally commission rates in certain environments), I thought this may be important to mention.


10-07-2008, 11:27 AM
Yes, I noticed that when compared side by side and since the ranking are used in visit schedules I opted for XLD's where the extra is included in the middle group.

Thanks again.....:cloud9: