Explanation

The goal is to combine data from different worksheets with a formula. Note that we are not restructuring the data in any way, we are simply combining data in different worksheets that already have the same structure. At a high level, the formula we are using combines data from multiple sheets with the VSTACK function and removes empty rows with FILTER. The data in the workbook comes from 3 separate worksheets (Sheet1, Sheet2, and Sheet3) and is combined in another worksheet named Summary. The Diagram below provides an overview:

Data from Sheet1, Sheet2, and Sheet3 is combined on a Summary sheet - 1

Notice that the range in the formula below is fixed as B5:B16 and some of the rows in this range are empty on each worksheet. This means the solution should also provide a way to remove empty rows in the final result. In the worksheet example above, the formula used to solve this problem in cell B5 looks like this:

=LET(data,VSTACK(Sheet1:Sheet3!B5:E16),FILTER(data,CHOOSECOLS(data,1)<>""))

Let’s look at how this formula works step-by-step. The first step is to combine the data, which is done with the VSTACK function.

Note: in this example, we are working with a tiny amount of data in a small number of worksheets so that the problem is easy to understand. However, the same approach will work with larger sets of data in many more worksheets.

Combining data with VSTACK and a 3D reference

The first step in this formula is to combine the data in three separate ranges on three separate sheets. When you have data in “normal” ranges (i.e. data not in a named range or an Excel Table) at a predictable location on multiple worksheets, a nice approach is to use the VSTACK function with a 3D reference like this:

=VSTACK(Sheet1:Sheet3!B5:E16)

The reference “Sheet1:Sheet3!B5:E16” is called a “3D reference”. It points to the range B5:E16 on Sheet1 through Sheet3 which, in this case, is Sheet1, Sheet2, and Sheet3. VSTACK retrieves data from the range B5:E16 on each sheet and stacks each range on top of another vertically. This is the core of the formula — the part that does the actual data consolidation.

Note: If you are new to the idea of a 3D reference, this video provides an overview .

This works well, as you can see in the screen below. However, one problem when using a fixed range like this is that the output may contain empty rows, which also appear in the output:

Output from VSTACK may contain empty rows  - 2

To remove the extra rows, we can use the FILTER function. However, before we do that, it makes sense to store the output from VSTACK in a variable with the LET function. The reason we do this is because we are going to use the same data twice in FILTER. By using a variable, we only need to run the VSTACK operation one time.

LET function

The LET function allows you to name variables inside a formula. This makes more complex formulas easier to read and write. It also improves performance since certain operations only need to run one time. In this case, we use LET to name the output from VSTACK “data” like this:

=LET(data,VSTACK(Sheet1:Sheet3!B5:E16),

VSTACK runs as explained above and the result (the combined ranges) is stored in the variable “data”. The next step is to remove the empty rows. For that, we’ll use the FILTER function.

Removing empty rows with FILTER

As seen in the screen above, the result from VSTACK (now stored in the variable data ) contains empty rows. To remove these empty rows from the final result, we use the FILTER function and the CHOOSECOLS function together like this:

FILTER(data,CHOOSECOLS(data,1)<>"") // remove empty rows

Here, the array in FILTER is provided as data , which was created by LET in the previous step. The logic to remove empty rows is based on the CHOOSECOLS function:

CHOOSECOLS(data,1)<>"" // test column 1

CHOOSECOLS is designed to retrieve one or more specific columns from a larger set of data. In this case, we are asking CHOOSECOLS for the first column in data . Then we use a logical expression to test for “not empty” cells in the column. The result is an array of TRUE and FALSE values, one per row. This array is returned to FILTER as the include argument. When FILTER applies this array of TRUE and FALSE values to data, only the rows associated with TRUE values are returned. The rows associated with FALSE values are discarded. The final result on the summary sheet looks like this:

Final result on Summary sheet - 3

Note: we are only testing the first column (Date) for empty rows. If you need a more robust test for empty rows, see this example .

Without the LET function

It isn’t necessary to use the LET function for this formula, it’s just more efficient. Here is the same formula without LET:

=FILTER(VSTACK(Sheet1:Sheet3!B5:E16),CHOOSECOLS(VSTACK(Sheet1:Sheet3!B5:E16),1)<>"")

The overall structure is the same, but notice that we need to run the VSTACK operation twice inside FILTER. With larger amounts of data, this will slow the formula down.

Without a 3D reference

The 3D reference is useful in this problem because the ranges on each sheet are at the same location. As long as this remains true, you can easily expand the 3D reference to include additional sheets. However, there is no requirement that you use a 3D reference. You could refer to each range separately inside VSTACK like this:

=VSTACK(Sheet1!B5:E16,Sheet2!B5:E16,Sheet3!B5:E16)

This works fine, but the approach won’t scale well as you add a larger number of sheets.

Using the combined data

Once you have combined the data with the formula above, you can use it normally in other functions. For example, to sum the Total for all rows with a color of “Red”, you can use a formula like this:

=SUMIFS(E5:E16,C5:C16,"red")

You can also refer to the spill range B5# to refer to the combined data. For example, to return all rows that have a color of “Red” you can use FILTER like this:

=FILTER(B5#,C5:C16="red")

With Excel Tables

If you are combining data that exists in Excel Tables , this problem is much easier. For example, with the data in this example in three tables (Table1, Table2, and Table3) we can combine the data with a simple VSTACK formula like this:

=VSTACK(Table1,Table2,Table3)

There is usually no need to remove empty rows because Excel Tables automatically expand and contract to fit the data they contain. This means we don’t need to use any fancy tricks to figure out how many rows to retrieve, or how many empty rows to remove — we can simply combine data in the tables with VSTACK.

Data in external workbooks

It is possible to combine data in separate external workbooks with VSTACK like this:

VSTACK(Sheet1.xlsx!B5:E16,Sheet2.xlsx!B5:E16,Sheet3.xlsx!B5:E16)

Or, with a 3D reference to a single workbook, like this:

VSTACK([workbook.xlsx]Sheet1:Sheet3!B5:E16)

Note: I have only tested the external links on a single local drive, so results may vary in different environments.

Explanation

In this example, the goal is to combine ranges. With the introduction of the VSTACK function and the HSTACK function, this is quite a simple task. To combine ranges vertically, stacking one range on top of another, you can use the VSTACK function like this:

=VSTACK(range1,range2)

To combine ranges horizontally, you can use the HSTACK function like this:

=HSTACK(range1,range2)

In both formulas above, range1 (B5:B8) and range2 (D5:D9) are named ranges . The named ranges are for convenience only, you can use the raw cell references with the same result. For details on how these functions work see our documentation here: VSTACK function , HSTACK function .

Manual approach

Before the VSTACK and HSTACK functions where introduced, but after dynamic array formulas were available , it was possible to combine ranges with a more complex formula using the SEQUENCE function together with the LET function, the INDEX function, and the IF function. This is a much more manual approach, but it is an interesting example of how you can iterate through cells in a range keeping track of where you are as you go. The original formulas are below for reference. They are still useful for understanding how you can manipulate arrays in a formula.

Single column ranges

Manual formula to combine single column ranges - 4

The formula to combine single column ranges is based on INDEX function , the SEQUENCE function , the IF function , and the LET function . In the example above, the formula in cell F5 is:

=LET(a,range1,b,range2,s,SEQUENCE(ROWS(a)+ROWS(b)),IF(s>ROWS(a),INDEX(b,s-ROWS(a)),INDEX(a,s)))

Adding line breaks to make the formula more readable, we have:

=LET(
   a,range1,
   b,range2,
   s,SEQUENCE(ROWS(a)+ROWS(b)),
   IF(s>ROWS(a),
      INDEX(b,s-ROWS(a)),
      INDEX(a,s)))

where range1 (B5:B8) and range2 (D5:D9) are named ranges . The first two lines inside let assign range1 to the variable “a” and assign range2 to the variable “b”.

Note: Range1 and Range2 do not have to be provided as named ranges; you could instead use B5:B8 and D5:D9.

Next, the SEQUENCE function creates a numeric “row index” to cover all rows in both ranges:

=SEQUENCE(ROWS(a)+ROWS(b))
=SEQUENCE(9)
={1;2;3;4;5;6;7;8;9}

The resulting array is assigned to the variable “s”. In the next line, the IF function is used to iterate through the array. If the current value s is greater than the rows in a , the INDEX function returns the value of b at row s minus the row count of a :

INDEX(b,s-ROWS(a)) // value from b

Otherwise, the INDEX function returns the value of a at row s :

INDEX(a,s) // value from a

The resulting values spill into the range F5:F13.

Note: a reader mentioned this formula to me based on the stackoverflow answer here .

Multiple column ranges

The formula to combine ranges with multiple columns is more complex. In the worksheet below, the formula in B5 looks like this

=LET(
   a,range1,
   b,range2,
   r,SEQUENCE(ROWS(a)+ROWS(b)),
   c,SEQUENCE(1,COLUMNS(a)),
   IF(
      r<=ROWS(a),
      INDEX(a,r,c),
      INDEX(b,r-ROWS(a),c))
)

where range1 (E5:F9) and range2 (H5:I10) are named ranges . Note that line breaks have been added for readability.

Formula to combine ranges with multiple columns - 5

Like the formula above, this formula figures out how many rows are in both ranges, and uses the SEQUENCE function to create a “row index” with the SEQUENCE function here:

SEQUENCE(ROWS(a)+ROWS(b)) // returns {1;2;3;4;5;6;7;8;9;10;11}

In a similar way, SEQUENCE is also used to create a “column index”, named “c”:

SEQUENCE(1,COLUMNS(a)) // returns {1,2}

The IF function tests all values in the row index sequence with the row count for range 1. When a row index value is less than or equal to the count of the rows in a (5), the INDEX function is used to fetch a row from range a at the current index value (r) :

INDEX(a,r,c) // from range a

When a row index value is greater than 5, INDEX is used to fetch rows from b :

INDEX(b,r-ROWS(a),c))

Note c remains constant as {1,2} , the column index for range a . This is a shortcut to keep things simple. This formula does not try to figure out if the column counts for both ranges are the same or not. It simply assumes the column counts are the same and requests both columns.

Custom function with LAMBDA

The LAMBDA function can be used to create custom functions. The formula on this page is a good candidate, because it is relatively complex. When converted to a custom LAMBDA function, it is much easier to call:

=AppendRange(range1,range2)

See this article for more detail.