PDA

View Full Version : Sorting and Filtering with Array Formulas

mikerickson
04-26-2009, 09:59 PM
I'm not sure if this is the kind of post desired or if the style is what you are looking for, but here's a contribution to the use of Array formulas.

Array formulas can be used to dynamicaly filter a range of data. One of the key steps is to an array formula to sort some numbers, so I begin with that technique.

1) Sorting numbers with array formulas:

Problem, A1:A10 contains unsorted numbers.
The goal, to have column B display them sorted ascending.

A non-array solution would be to put =SMALL(\$A\$1:\$A\$10, ROW()) in B1 and drag downward.

Another approach would be to select B1:B10 and enter the array formula
{=SMALL(A1:A10,ROW(Z1:Z10))}
In this formulation ROW(Z1:Z10) is the array ({1;2;3;4;5;6;7;8;9;10}, while A1:A10 is a constant range.

However, inserting a new row 1 will change this to
{=SMALL(A2:A11,ROW(Z2:Z11))}
and the array of second arguments becomes {2;3;4;5;6;7;8;9;10;11}

To give the formula robustness, using INDIRECT keeps the indexing array from reacting to row insertions. That makes the formula (without the inserted row)
{=SMALL(A1:A10,ROW(INDIRECT("1:10")))}

Note that {=SMALL(A1:A10,ROW(INDIRECT("1:100")))} will give the same result as the previous formula. This permits this method to be used to sort dynamic ranges of an unknown size, if the maximum size can be estimated.

2) Filtering with array formulas

Problem, A1:A10 contains a list of animals
Cat
Dog
Fish
Worm
Hamster
Bird
Jim
Goat
Hawk
The goal, to list all the 4 lettered animals in column B.
Solution: Select B1:B10 and enter the array formula
{=INDEX(A:A, SMALL(IF(LEN(A1:A10)=4,ROW(A1:A10),99),ROW(INDIRECT("1:10"))), 1) & ""}

Explaination,
Consider the array {IF(LEN(A1:A10)=4,ROW(A1:A10),99)}
The ith element of that array will be either 99 or the row number a four lettered animal from col A.
{99; 99; 3; 4; 5; 99; 7; 99; 9; 10} in our example

Sorting that array with the technique from above gives the array formula
{SMALL(IF(LEN(A1:A10)=4,ROW(A1:A10),99),ROW(INDIRECT("1:10")))}
which returns {3; 4; 5; 7; 9; 10; 99; 99; 99; 99}

Plugging that into INDEX gives almost the result above.
{=INDEX(A:A,SMALL(IF(LEN(A1:A10)=4,ROW(A1:A10),99),ROW(INDIRECT("1:10"))), 1)}
The first 6 elements of this array are the cells in A1:A10 with a 4 lettered beast. The last 4 are the contents of A99. If A99 is blank, this INDEX formula will return a 0. To coerce this to a blank seeming cell concatenating "" gives the final formula.

{=INDEX(A:A, SMALL(IF(LEN(A1:A10)=4,ROW(A1:A10),99),ROW(INDIRECT("1:10"))), 1) & ""}

Any simple logical test can be used with this formulation.
With larger data sets, the 99 needs to be increased to insure that it is larger than the largest row number in the data set. Similarly, the INDIRECT("1:10") should have enough rows to match the size of the data set.
In both cases, overestimating the 99 or the "1:10" will not harm the formula.

Note also that this is dependent on row numbers, not indexes withing the data range. The first argument of the INDEX has to be the entire column which contains the data range. At a minimum, the array argument of the INDEX has to start in row 1.
If named ranges are used, the formula becomes
{=INDEX(dataColumn, SMALL(IF(LEN(dataRange)=4,ROW(dataRange),ROW(dataRange)+ROWS(dataRange)), ROW(INDIRECT("1:65536"))),1) & ""}

where the Name: dataColumn RefersTo: =INDEX(Sheet1!\$1:\$65536,0,COLUMN(dataRange))

The formulation ROW(dataRange)+ROWS(dataRange) keeps the "99" term below the last row of dataRange. The INDIRECT("1:65536") has a similar purpose (in pre2007 Excel).