Explanation
In this example, the goal is to calculate an average for any given group (“A”, “B”, or “C”) across all three months of data in the range C5:E16. For convenience only, data (C5:E16) and group (B5:B16) are named ranges . In the article below, we look at several approaches to this problem:
- Why the AVERAGEIFS function won’t work.
- A solution based on AVERAGE + FILTER
- A solution based on AVERAGE + IF function
- A solution based on SUMPRODUCT and Boolean algebra
In the latest version of Excel, the FILTER option (#2) is easy and intuitive. In Legacy Excel , you can use solution #3 or #4.
AVERAGEIFS won’t work
You might be tempted to solve this problem with the AVERAGEIFS function . After all, it seems to fit the bill. We simply need to calculate an average for a range of data based on one condition: we need to check if group (B5:B16) is equal to “A” or “B” or “C”. In fact, we can easily use AVERAGEIFS to calculate an average for a given group on one month of data. For example, to calculate an average for group “A” in January, we can use a formula like this:
=AVERAGEIFS(C5:C16,group,"A") // returns 42
However, if we try to expand average_range to include all three columns in data (C5:E16), we’ll get a #VALUE! error:
=AVERAGEIFS(data,group,"A") // returns #VALUE!
Why? The reason is that AVERAGEIFS expects average_range to be the same size as criteria_range . When we try to use the 1-column range group (B5:B16) with the 3-column range data (C5:E16), AVERAGEIFS returns an error. Incidentally, if we give the older AVERAGEIF function the entire data range and the same criteria, we don’t get an error, but we do get an incorrect result:
=AVERAGEIF(group,"A",data) // returns 42
This happens because AVERAGEIF makes certain assumptions about average_range , essentially resizing it to match the range argument, using the upper left cell in the range as an origin. It’s worth noting that this kind of “silent failure” is dangerous, in that the result seems reasonable but is in fact incorrect . You may not like formula errors, but at least they tell you something is wrong.
AVERAGE with FILTER
In the latest version of Excel, a good solution in this case is to use the AVERAGE function with the FILTER function . This is the approach used in the worksheet shown, where the formula in cell H5, copied down, is:
=AVERAGE(FILTER(data,group=G5))
And data (C5:E16) and group (B5:B16) are named ranges . Inside the AVERAGE function, the FILTER function is configured to filter the data in C5:E16 with a simple logical expression:
FILTER(data,group=G5)
Because cell G5 contains “A”, and group (B5:B16) contains 12 values, the expression returns an array with 12 TRUE and FALSE values like this:
{TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE}
Notice the first four values in the array are TRUE, which corresponds to the first 4 rows in the data, which are in group A. This array is returned to the FILTER function as the include argument, and FILTER uses this array to select the first 4 rows of data (C5:E16).
The result from FILTER is delivered directly to the AVERAGE function as a single array:
=AVERAGE({58,41,48;37,46,32;38,48,38;35,59,46})
AVERAGE returns a final result of 43.8, the average of the 12 numbers in the array returned by FILTER. As the formula is copied down, it calculates an average for each group, using the value in column G for group.
AVERAGE with IF
The FILTER function is a newer function that does not exist in Legacy Excel . If you are using an older version of Excel, you can solve this problem with a simple array formula like this:
{=AVERAGE(IF(group=G5,data))}
In this formula, we use the IF function to filter values in each group instead of FILTER. When the value in group matches the value in G5 (“A”), IF returns the corresponding values in data . When a value doesn’t match, IF returns FALSE for corresponding values in data . After IF is evaluated, the array of results returned to AVERAGE looks like this:
=AVERAGE({58,41,48;37,46,32;38,48,38;35,59,46;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE})
This works because the AVERAGE function will automatically ignore the logical values TRUE and FALSE. This is an array formula and must be entered with control + shift + enter in older versions of Excel.
One thing to be aware of with this approach is that empty cells will be treated as zero, and become part of the calculated average. This happens because when the empty cells get passed through the IF function, they become zero (0). Although the AVERAGE function will ignore empty values, it will include zero (0) values in the calculated average. To avoid this problem, you can add a second IF function to test for empty values like this:
{=AVERAGE(IF(group=G5,IF(data<>"",data)))}
In this formula, only values that are part of group “A” and are not empty are passed into the AVERAGE function. All other values become FALSE and are ignored by the AVERAGE function.
Both formulas above are array formulas and must be entered with control + shift + enter in older versions of Excel. In the current version of Excel, which supports array formulas natively , the formulas will “just work”.
SUMPRODUCT function
As you might guess, you can also use the flexible SUMPRODUCT function to solve this problem in older versions of Excel. The formula looks like this:
=SUMPRODUCT(--(group=G5)*data)/SUMPRODUCT(--(group=G5)*(data<>""))
In this formula, the first SUMPRODUCT calculates a sum of all data in group “A” (from cell G5):
=SUMPRODUCT(--(group=G5)*data) // sum (526)
The second SUMPRODUCT calculates a count of all data in the same group:
SUMPRODUCT(--(group=G5)*(data<>"")) // count (12)
After both SUMPRODUCT formulas are evaluated, the final step is to divide the sum by the count:
=SUMPRODUCT(--(group=G5)*data)/SUMPRODUCT(--(group=G5)*(data<>""))
=526/12
=43.8
Although slightly more complicated, the SUMPRODUCT formula does not need to be entered in a special way with control + shift + enter, since SUMPRODUCT can handle array operations natively .
Explanation
In this example, the goal is to average the last 3 numeric values in a set of data. The best solution depends on the version of Excel you have available. In the current version of Excel, this can be nicely solved with a formula based on the AVERAGE function , the FILTER function , and the TAKE function . In older versions of Excel, you can use an alternative formula based on the LOOKUP function , the LARGE function , and the ROW function . Both approaches are explained below.
Note: the difference in complexity between the modern formula and the legacy formula below is a great example of how new functions in Excel are making hard problems much easier to solve.
Modern formula
In the current version of Excel, which supports dynamic array formulas , you can solve this problem with a formula like this:
=AVERAGE(TAKE(FILTER(data,ISNUMBER(data)),-3))
Working from the inside out, the FILTER function is configured to extract only numeric values from the named range data (B5:B16) like this:
FILTER(data,ISNUMBER(data))
The ISNUMBER function creates the filtering logic. ISNUMBER returns TRUE for numeric values and FALSE for anything else. Because we are giving ISNUMBER a range that contains 12 values, ISNUMBER returns an array with 12 TRUE or FALSE values like this:
{TRUE;TRUE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;TRUE}
This array is returned to the FILTER function as the include argument, and FILTER returns only the 8 numeric values in data in an array like this:
{95;94;97;96;98;99;100;101}
This array above is handed off to the TAKE function , which is configured to return only the last 3 values:
TAKE({95;94;97;96;98;99;100;101},-3)
TAKE then returns 99, 100, and 101 to the AVERAGE function :
=AVERAGE({99;100;101})
AVERAGE calculates an average of the 3 values and returns 100 as a final result.
Variable n
The generic form of this formula where n is a variable is shown below. To change the number of numeric values being averaged, just change n to a different number.
=AVERAGE(TAKE(FILTER(data,ISNUMBER(data)),-n))
One nice thing about the TAKE function is that it will automatically handle the case where the number of requested values is greater than the number of values in the array returned by FILTER. For example, if you ask TAKE for 5 values, and there are only 3 values available, TAKE will return 3 values without an error.
Last n columns
Although the example shown has data in rows, the formula can be adjusted to work with data in columns like this:
=AVERAGE(TAKE(FILTER(data,ISNUMBER(data)),1,-n))
This formula assumes that values appear in columns. The only difference is that the 2nd argument in TAKE ( rows ) is now 1, and a 3rd argument ( columns ) has been added and set to -3. In other words, we are asking TAKE for the last n columns instead of the last n rows.
Legacy Excel
In Legacy Excel we need to take a different approach because we don’t have the FILTER function or the TAKE function to use. One option is to use a formula like this:
=AVERAGE(LOOKUP(LARGE(IF(ISNUMBER(data),ROW(data)),{1,2,3}),ROW(data),data))
Notice the AVERAGE function is the outermost function in the formula. AVERAGE will calculate an average of numbers presented in an array, so almost all the work in this formula is to generate an array of the last 3 numeric values in a range. Working from the inside out, the IF function is used to “filter” numeric values:
IF(ISNUMBER(data),ROW(data))
The ISNUMBER function returns TRUE for numeric values, and FALSE for other values (including blanks), and the ROW function returns row numbers, so the result of this operation is an array of row numbers that correspond to numeric entries:
{5;6;FALSE;FALSE;9;10;11;12;13;FALSE;FALSE;16}
This array goes into the LARGE function with the array constant {1,2,3} for k :
LARGE({5;6;FALSE;FALSE;9;10;11;12;13;FALSE;FALSE;16},{1,2,3})
LARGE automatically ignores the FALSE values and returns an array with the largest 3 numbers, which correspond to the last 3 rows with numeric values:
{16,13,12}
This array goes into the LOOKUP function as the lookup value. The lookup array is provided by the ROW function, and the result array is the named range data (B5:B16):
LOOKUP({16,13,12},ROW(data),data)
After ROW runs, we have:
LOOKUP({16,13,12}, {5;6;7;8;9;10;11;12;13;14;15;16}, data)
LOOKUP locates the 3 row numbers in the array of row numbers returned by ROW, and returns the 3 corresponding values from data directly to the AVERAGE function:
=AVERAGE({101,100,99})
AVERAGE calculates an average of the 3 values and returns 100 as a final result.
Note: I ran into this clever approach over on chandoo.org, in a reply by Sajan to a similar question.
Making n variable
To make n variable so that it can be easily changed, you can replace the hardcoded array constant {1,2,3} with a dynamic array created with the INDIRECT function like this:
=AVERAGE(LOOKUP(LARGE(IF(ISNUMBER(data),ROW(data)),ROW(INDIRECT("1:"&n))), ROW(data), data))
Note that if the number of numeric values in data drops below n , this formula will return the #NUM error since LARGE won’t be able to return 3 values as requested. To guard against this problem, you can use the MIN function like this:
ROW(INDIRECT("1:"&MIN(3,COUNT(data))))
Here, MIN is used to set the size of the requested array to n or the actual count of numeric values, whichever is smaller.