Split Comma

Explanation

In this example, the goal is to split comma-separated values (CSV) in column B into multiple columns, as seen in the worksheet shown. Each text string in column B contains 5 fields separated by commas, so we expect to get 5 columns of data as a result. The header row in column D is manually entered.

One disadvantage of the TEXTSPLIT + REDUCE + VSTACK formula featured in this article is that it is somewhat complex. See below for a TEXTSPLIT + TEXTJOIN alternative formula that is much simpler, but note that it will fail on large sets of data.

How TEXTSPLIT works
TEXTSPLIT to split one row
TEXTSPLIT with REDUCE to split all rows
TEXTSPLIT with a dynamic range
Choose specific columns
TEXTSPLIT with TEXTJOIN alternative
Formula for Legacy Excel
Extract nth field
Text-to-columns alternative

How TEXTSPLIT works

The core of the solution is the TEXTSPLIT function, which is designed to split delimited text strings into multiple columns or rows. For example, if we give TEXTSPLIT a string like “red,blue,green”, and provide a comma (",") as the column delimiter, it will return an array that contains three values:

=TEXTSPLIT("red,blue,green",",") // returns {"red","blue","green"}

Because we provided a comma (",") as the column delimiter , TEXTSPLIT splits the text string into a horizontal array of values. When this array is returned to the worksheet, it spills into multiple columns .

For more details on how TEXTSPLIT works, see our TEXTSPLIT function page .

TEXTSPLIT to split one row

Before we look at how TEXTSPLIT can be configured to process a range of values, let’s see how we can use TEXTSPLIT to split a single row of comma-separated values. In the worksheet shown, the text in cell B5 is “Jim,Brown,33,Seattle,WA”. To split this text into five separate values, we can use the following formula in cell D5:

=TEXTSPLIT(B5,",")

Although TEXTSPLIT can take up to six separate arguments , in this case, we only need to provide the first two arguments, text and col_delimiter . Notice we must provide the comma as text surrounded by double quotes (","). The result is a horizontal array with five values like this:

{"Jim","Brown","33","Seattle","WA"}

This array is returned to cell D5. Because the comma has been provided as the column delimiter , the five values spill into the range D5:H5:

Split a single row of comma-separated values with TEXTSPLIT - 1

When we copy the formula down to cell D15, we have results for all 11 rows split into comma-separated values into multiple columns:

Split a all rows of comma-separated values with copy paste - 2

At this point, you might think we can simply give TEXTSPLIT a range of values to process all rows at the same time. Like this:

=TEXTSPLIT(B5:B15,",")

However, due to a limitation in Excel, this won’t work. Excel will not allow a formula to return an array of arrays . Instead, Excel will truncate the results to the first value only:

To make a formula that will process all rows at the same time, we need to upgrade the formula significantly.

There is no requirement to process all rows at the same time. Setting up the formula to handle one row at a time is much simpler, but it does require copying the formula down to each row.

TEXTSPLIT with REDUCE to split all rows

One way to make Excel process all comma-separated values in column B at the same time is to use the REDUCE function in a formula like this:

=DROP(REDUCE("",B5:B15,LAMBDA(a,v,VSTACK(a,TEXTSPLIT(v,",")))),1)

When we enter this formula in cell D5, all results are returned in one go:

Splitting all rows with a formula using TEXTSPLIT and REDUCE - 4

This is a pretty advanced formula, so let’s work through it step-by-step. At a high level, we use the REDUCE function to loop through the text values in B5:B15 one by one. For each new value, we apply the TEXTSPLIT function to split each comma-separated value and use the VSTACK function to stack the results vertically. REDUCE works by applying a custom LAMBDA function to each value in a given array, and it accumulates results to a single value. The generic syntax looks like this:

REDUCE(initial_value,array,lambda(a,v,calculation))

The a is the accumulator (the running result), and v is each value in the array. These values are passed into the LAMBDA by REDUCE for each iteration. The actual work is done by the LAMBDA function, which is configured like this:

LAMBDA(a,v,VSTACK(a,TEXTSPLIT(v,",")))

As REDUCE loops through the range, the LAMBDA function splits each value v with TEXTSPLIT:

TEXTSPLIT(v,",") // split value by comma

And then adds the result to the accumulating results a with the VSTACK function:

VSTACK(a,TEXTSPLIT(v,",")) // stack results vertically

Because the REDUCE function has been configured to start with an empty text string (""):

REDUCE("",B5:B15,LAMBDA(...))

We will end up with an extra blank row at the top of our results. To remove this blank row, we use the DROP function :

DROP(REDUCE(...),1)

DROP removes the first row from the array returned by REDUCE. The final result is a single array that contains 11 rows, just like the source data column B. This array lands in cell D5 and spills into the range D5:H15.

The reason this formula works is that the REDUCE function returns a single array , instead of an array of arrays. The final array is built up one row at a time using the VSTACK function as REDUCE loops over the values.

TEXTSPLIT with a dynamic range

Because the REDUCE version of the formula is much more complicated than the original text split formula above, you might wonder — why bother? One answer is that this formula will work with any number of rows. In other words, you can feed a dynamic range into this formula, and the results will expand to include all values in the range. For example, we can adjust the formula to use a dynamic range by adding the TRIMRANGE function like this:

=DROP(REDUCE("",TRIMRANGE(B5:B1000),LAMBDA(a,v,VSTACK(a,TEXTSPLIT(v,",")))),1)

Or we can use the dot operator syntax like this:

=DROP(REDUCE("",B5:.B1000,LAMBDA(a,v,VSTACK(a,TEXTSPLIT(v,",")))),1)

In both cases, the range provided to reduce will expand to include new values added up to row 1000. You can see how this works below, where I have adjusted the formula to use the dot operator syntax and added a new comma-separated text value to cell B16:

Notice the range provided to REDUCE is now B5:.B1000 , which expands to include new values added up to row 1000. All empty trailing rows are automatically removed from the range before it is passed to REDUCE.

Choose specific columns

If you only want specific columns from the CSV you can easily adjust the formula to include the CHOOSECOLS function . For example, to get just the first name, state, and age, you can use a formula like this

=LET(
rng,B5:.B1000,
result,DROP(REDUCE("",rng,LAMBDA(a,v,VSTACK(a,TEXTSPLIT(v,",")))),1),
CHOOSECOLS(result,{1,5,3})
)

Here, we’ve adjusted the formula to use the LET function in order to define two variables: rng and result . The rng variable is the range of values to process, and result is the array of results from the REDUCE function. We then use the CHOOSECOLS function to select only columns 1, 5, and 3 from the result array. Notice we are using the dynamic range syntax explained in the previous section, so the results will expand if new values are added. Also, we have reordered the columns by passing the array {1,5,3} to CHOOSECOLS:

CHOOSECOLS(result,{1,5,3})

Here is the final result:

Choosing specific columns to return with CHOOSECOLS - 6

TEXTSPLIT with TEXTJOIN alternative

A different and simpler way to approach this problem is to use a formula based on TEXTSPLIT and TEXTJOIN:

=TEXTSPLIT(TEXTJOIN(";",,B5:B15),",",";")

Alternative formula based on TEXTJOIN and TEXTSPLIT - 7

This is a clever approach that relies on TEXTSPLIT’s ability to split text into rows and columns at the same time. Basically, we use the TEXTJOIN function first to concatenate all values in the range using a semicolon (";") as the delimiter. Then we feed the concatenated text into the TEXTSPLIT function, which is configured to use a comma (",") for the column delimiter and a semicolon (";") for the row delimiter. TEXTSPLIT then splits the concatenated text into rows and columns at the same time. This works great, and the formula runs quickly. However, there is an important limitation.

Excel cells have a hard limit of 32,767 characters, and if TEXTJOIN produces a result that exceeds this limit, it returns a #CALC! error, which causes the formula to fail. What this means is that if you have a lot of data, the formula may not work. At some point, as the range expands and/or the number of CSV fields increases, the character count of the concatenated text will exceed 32,767, and the formula will return an error. The TEXTSPLIT + REDUCE + VSTACK formula will continue to work on larger data sets, but it will slow down.

The bottom line? TEXTSPLIT + TEXTJOIN is a compact and elegant formula that leverages TEXTSPLIT’s ability to split text into 2d arrays. It works great on smaller data sets, but it will fail on large data sets.

Formula for Legacy Excel

Note: Before TEXTSPLIT, you could use a hacky workaround formula based on the FILTERXML function to split comma-separated values into multiple columns. This approach doesn’t make any sense to use today. However, I’m leaving it here for reference as a reminder of how complicated things used to be before the introduction of dynamic arrays , spills and TEXTSPLIT . This approach only works in Windows versions of Excel, since the Mac version of Excel does not have the FILTERXML function.

In older versions of Excel without TEXTSPLIT, you can use a more complicated formula based on the FILTERXML function with help from the SUBSTITUTE and TRANSPOSE functions. The formula looks like this:

=TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(B5,",","</y><y>")&"</y></x>","//y"))

Splitting comma-separated text with FILTERXML function - 8

To use FILTERXML, we need XML , so the first task is to add XML markup to the text. We are going to arbitrarily make each field in the text an element, enclosed with a parent element. We start with the SUBSTITUTE function here:

SUBSTITUTE(B5,",","</y><y>")

The result from SUBSTITUTE is a text string like this:

"Jim</y><y>Brown</y><y>33</y><y>Seattle</y><y>WA"

To ensure well-formed XML tags and to wrap all elements in a parent element, we prepend and append more XML tags like this:

"<x><y>"&SUBSTITUTE(B5,",","</y><y>")&"</y></x>"

This yields a text string like this (line breaks added for readability)

"<x>
<y>Jim</y>
<y>Brown</y>
<y>33</y>
<y>Seattle</y>
<y>WA</y>
</x>"

This text is delivered directly to the FILTERXML function as the xml argument, with an Xpath expression of “//y”:

FILTERXML("<x><y>Jim</y><y>Brown</y><y>33</y><y>Seattle</y><y>WA</y></x>","//y")

Xpath is a parsing language and “//y” selects all elements. The result from FILTERXML is a vertical array like this:

{"Jim";"Brown";33;"Seattle";"WA"}

Because we want a horizontal array in this instance, we wrap the TRANSPOSE function around FILTERXML:

=TRANSPOSE({"Jim";"Brown";33;"Seattle";"WA"})

The result is a horizontal array like this:

{"Jim","Brown",33,"Seattle","WA"}

In older versions of Excel, you can enter this formula as a multi-cell array formula in D5:H5. In Excel 365, the array will spill into the range D5:H5 automatically.

I learned the FILTERXML trick from Bill Jelen in a MrExcel video . FILTERXML is not available in Excel on the Mac, or in Excel Online. This is a nerdy workaround for difficult problems in older versions of Excel, but it doesn’t make sense in a modern version of Excel. The new TEXTSPLIT function is a much better method.

If you don’t want to use FILTERXML, or can’t because you are using Excel on a Mac, this example shows another way to split text with a delimiter. This approach requires a bit more setup.

Extract nth field

With either option above, you may want to extract just the nth field from a single text string. To do that, you can use the INDEX function . For example, to extract just the age with TEXTSPLIT, you can use a formula like this:

=INDEX(TEXTSPLIT(B5,","),3) // get age

Notice we have simply nested the original formula inside the INDEX function. With FILTERXML, the formula looks like this:

=INDEX(TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(B5,",","</y><y>")&"</y></x>","//y")),3)

Text-to-columns alternative

Formulas work great when you need a solution that is dynamic, because formulas will update automatically if data changes. However, if you only need a one-off manual process, you can also use Excel’s Text-to-Columns feature.

Explanation

In this example, the goal is to split the text strings in column B, which contain three dimensions in the form “L x W x H”, into 3 separate dimensions. One problem with dimensions entered as text is that they can’t be used for any kind of calculation. So, in addition, we want our final dimensions to be numeric . In a problem like this, we need to identify the delimiter, which is the character (or characters), that separate each thing we want to extract. In this case, the delimiter is the “x” character. Note that the “x” has a space (" “) on either side, something we’ll also need to handle. Also notice that the “x” appears in different locations, so we can’t extract dimensions by position.

There are two basic approaches to solving this problem. If you are using Excel 365 , the easiest solution is to use the TEXTSPLIT function as shown in the worksheet above. If you are using an older version of Excel without TEXTSPLIT, you can use more complicated formulas based on several functions, including LEFT , RIGHT , LEN , SUBSTITUTE , and FIND . Both approaches are explained below.

TEXTSPLIT function

The TEXTSPLIT function is a great way to solve this problem, because it is so simple to use. To split dimensions into three parts, using the “x” as a delimiter, the formula in D5, copied down, is:

=TEXTSPLIT(B5,"x")+0

The formula works in two steps. First, TEXTSPLIT splits the text in B5 using the “x”. The result is a horizontal array that contains three elements, one for each dimension:

={"10 "," 5 "," 7"}+0

Notice the numbers are still surrounded by space. Our goal is to get actual numeric values, so in the second step, we simply add zero. This is a simple way of getting Excel’s formula engine to coerce a text value to an actual number. The result is an array like this:

={10,5,7} // true numbers

Notice the double quotes (”") are gone, because the math operation of addition (+) changes the text values to actual numbers. The formula returns this result to cell D5, and the three dimensions spill into the range D5:F5.

Note: one nice thing about the “add zero” trick, is that it doesn’t matter if the number is surrounded by space characters or not. The numbers can be separated with " x " or “x” in the original text string with the same result. However, if you are splitting values that are not meant to be numbers, you will want to remove the +0, otherwise the formula will return a #VALUE! error.

Legacy Excel

In Legacy Excel , we need to use more complicated formulas to accomplish the same thing. To get the first dimension (L), we can use a formula like this in D5:

=LEFT(B5,FIND("x",B5)-1)+0

At a high level, this works by extracting text starting from the left side. The number of characters to extract is calculated by locating the first “x” in the text using the FIND function, then subtracting 1:

=LEFT(B5,FIND("x",B5)-1)+0
=LEFT(B5,4-1)+0
=LEFT(B5,3)+0
="10 "+0
=10

To get the second dimension, we can use a formula like this in cell E5:

=MID(B5,FIND("x",B5)+1,FIND("~",SUBSTITUTE(B5,"x","~",2))-FIND("x",B5)-1)+0

At a high level, this formula extracts the width (W) with the MID function , which returns a given number of characters starting at a given position in the next. The starting position is calculated with the FIND function like this:

FIND("x",B5)+1

FIND simply locates the first “x” and returns the location (4) as a number. Then we add one to start at the first character after “x”:

=FIND("x",B5)+1
=4+1
=5

The number of characters to extract, which is provided as num_chars to the MID function, is the most complicated part of the formula:

FIND("~",SUBSTITUTE(B5,"x","~",2))-FIND("x",B5)-1

Working from the inside out, we use SUBSTITUTE with FIND to locate the position of the 2nd “x”, as described here . We then subtract from that the location of the first “x” + 1.

=FIND("~",SUBSTITUTE(B5,"x","~",2))-FIND("x",B5)-1
=FIND("~","10 x 5 ~ 7")-FIND("x",B5)-1
=8-FIND("x",B5)-1
=8-4-1
=3

The main trick here is that we are using the seldom seen instance_num argument in the SUBSTITUTE function to replace only the second instance of the “x” with a tilde (~), so that we can target the second instance of “x” with the FIND function in the next step.

Now that we’ve calculated the start_num and num_chars , we can simplify the original MID formula to this:

=MID(B5,5,3)+0
=MID("10 x 5 x 7",5,3)+0
=" 5 "+0
=5

Note we are using the trick of adding zero again to force Excel to coerce the next to a number. Finally, to get the third dimension, we can use a formula like this in cell F5:

=RIGHT(B5,LEN(B5)-FIND("~",SUBSTITUTE(B5,"x","~",2)))+0

This formula works a lot like the formula to get the second dimension above. At a high level, we are using the RIGHT function to extract text from the right. The main challenge is to calculate how many characters to extract, num_chars , which is done again with FIND and SUBSTITUTE like this:

LEN(B5)-FIND("~",SUBSTITUTE(B5,"x","~",2))

As above, we use 2 for instance_num argument in the SUBSTITUTE function to replace only the second instance of the “x” with a tilde (~), so that we can target this instance of “x” with the FIND function in the next step:

=LEN(B5)-FIND("~",SUBSTITUTE(B5,"x","~",2))
=RIGHT(B5,10-8)+0
=RIGHT(B5,2)+0
=" 7"+0
=7

The LEN function returns the total characters in the text string (10) and FIND returns 8 as the location of the second “x”, so num_chars becomes 2 in the end. RIGHT returns the 2 characters from the right side of the text string (which includes a space) we add zero to the result to force Excel to change the next to a number.

Explanation#

How TEXTSPLIT works#

TEXTSPLIT to split one row#

TEXTSPLIT with REDUCE to split all rows#

TEXTSPLIT with a dynamic range#

Choose specific columns#

TEXTSPLIT with TEXTJOIN alternative#

Formula for Legacy Excel#

Extract nth field#

Text-to-columns alternative#

Explanation#

TEXTSPLIT function#

Legacy Excel#

Explanation

How TEXTSPLIT works

TEXTSPLIT to split one row

TEXTSPLIT with REDUCE to split all rows

TEXTSPLIT with a dynamic range

Choose specific columns

TEXTSPLIT with TEXTJOIN alternative

Formula for Legacy Excel

Extract nth field

Text-to-columns alternative

Explanation

TEXTSPLIT function

Legacy Excel