Explanation
In this example, the goal is to remove the protocol from a list of URLs. To remove the protocol from a URL, we need to remove the first part of the URL. Protocols typically look like this:
- http://
- https://
- sftp://
Notice that all protocols end with a double slash ("//"). In the current version of Excel, the easiest way to do this is with the TEXTAFTER function. In older versions of Excel, you can use a formula based on the MID and FIND functions. Both options are explained below.
TEXTAFTER function
The TEXTAFTER function returns the text that occurs after a given delimiter. TEXTAFTER supports many options , but for this problem, we only need to provide the first two arguments:
=TEXTAFTER(text,delimiter)
- Text : the text string to process.
- Delimiter : the place at which to split the text.
To remove all text up to and including the double slash, we can use the TEXTAFTER function like this:
=TEXTAFTER(B5,"//")
As the formula is copied down the table, it extracts the text that occurs after the double slash ("//"). The result is the original URL without the protocol.
Legacy Excel
In an older version of Excel without the TEXTAFTER function, you can remove the protocol from a URL with a formula based on the MID function and the FIND function like this:
=MID(B5,FIND("//",B5)+2,LEN(B5))
The core of this formula is the MID function , which extracts the text in a URL starting with the character after “//”, and ending with the character before the trailing slash ("/"):
=MID(text,start_num,num_chars)
The text is the URL in cell B5. The start_num is calculated using the FIND function like this:
FIND("//",B5)+2
FIND returns the position of the double slash ("//") in the URL as a number. With the text “https://www.domain.com” in cell B5, FIND returns 9. We don’t want to start extracting at character 9 however, we want to skip the double slash ("//") altogether, so we add 2 to the result from FIND which results in 11. This is the value used for start_num. At this point, we have:
=MID(B5,11,LEN(B5))
To provide a value for num_chars , we use the LEN function , which returns a count of all the characters in B5, which is 22. Using the LEN function like this is a shortcut, designed to simplify the formula. LEN will return 22, which is greater than the number of characters that remain. However, when num_chars exceeds the remaining string length, MID will simply extract all remaining characters. Using LEN to provide num_chars is an easy way to give MID a number that is always large enough, without the trouble of calculating exactly how many characters remain. Dropping in the value returned by the LEN function, we now have a formula that looks like this:
=MID(B5,11,22) // returns "www.domain.com"
The MID function begins extracting at character 11 and extracts all remaining text. The final result is “www.domain.com”.
Explanation
The goal is to remove the forward-slash ("/") from the URLs in column B when it is present as the last character. When a URL does not end with a forward slash ("/") the original URL should be returned without modification. Despite the fact that Excel offers many functions designed to work with text strings, there is no entirely straightforward way to solve this problem. The simplest method is to use a formula based on the LEFT function with help from LEN and RIGHT.
Sometime after I wrote this article, Excel introduced three new regex functions . One of these functions, REGEXREPLACE , can be used to solve this problem in a simple way. See below.
LEFT formula solution
The solution in the worksheet shown is based on the LEFT function . The formula in cell D5 is:
=LEFT(B5,LEN(B5)-(RIGHT(B5)="/"))
The tricky part of this formula is that it is conditional. If a URL ends in a forward slash ("/") it is removed. If a URL does not end with a forward slash, it is returned unchanged. This could be accomplished with the IF function inside the formula, as explained later in the article. However, the formula above uses a different approach, which takes advantage of the fact that Excel will convert TRUE to 1 and FALSE to zero when prompted by a math operation. This technique is sometimes referred to as Boolean logic and is a way to make parts of a formula conditional without using the IF function.
Video: Boolean algebra in Excel
The basic idea
At the core, this formula uses the LEFT function to return text starting from the left. If we just wanted to always remove the last character from a text string (regardless of the character) we could use a simple formula like this:
=LEFT(B5,LEN(B5)-1)
Here, the LEN function is used to get a total count of characters in cell B5, from which we subtract 1. The result is returned to LEFT as the num_chars argument, and LEFT then returns all but the last character. This works great, but we need to remove the last character only when it is a forward slash . This means that we need to add conditional logic that will return LEN(B5)-1 if there is a trailing slash and LEN(B5) if there is not a trailing slash.
Conditional logic
To work out the number of characters that should be returned conditionally, the formula uses this expression:
LEN(B5)-(RIGHT(B5)="/")
Here, total characters are calculated with the LEN function as before. From this result, the result of the following expression is subtracted:
RIGHT(B5)="/"
This expression uses the RIGHT function to extract the last character on the right, then tests the result to see if it is a forward slash ("/"). If the expression above returns TRUE, the math operation of subtraction will cause Excel to convert the TRUE to 1. If the expression returns FALSE, Excel will convert FALSE to zero. Therefore, when a URL ends in a forward slash “/”, we subtract 1 from the count returned by LEN, and RIGHT returns all but the last character. However, when a URL does not end in “/”, we subtract zero from the count returned by LEN, and RIGHT returns the entire string unchanged.
Note: Normally, we would also give RIGHT the number of characters to extract, which is called num_chars. However, num_chars will default to 1 if not provided, so we omit it here since we only want the last character.
With the IF function
The conditional logic trick above is clever, but the resulting formula is a bit cryptic. A more traditional formula below uses the IF function to accomplish the same thing:
=LEFT(B5,IF(RIGHT(B5)="/",LEN(B5)-1,LEN(B5)))
Although slightly longer, the conditional logic in this formula is easier to read, and the formula returns the same result in the end. Select the best option to use based on your personal preference.
With the REGEXREPLACE function
In 2024, Excel introduced support for regular expressions (regex) in formulas in the form of three new functions: REGEXTEST , REGEXEXTRACT , and REGEXREPLACE . We can use the REGEXREPLACE function to solve this problem cleanly like this:
=REGEXREPLACE(B5,"/$","")
Notice that we only have a single function call for this solution. The inputs to REGEXREPLACE are as follows:
- text - B5
- pattern - “/$”
- replacement - ""
At a high level, we are replacing the trailing slash / with an empty string "" . The trick is in how we match the last character, which is done with the pattern “/$” . In regex, the dollar sign $ is a special anchor character that matches the end of a text string. The result is a pattern that matches the slash / only when it is the last character. No conditional logic is needed. If a slash appears as the last character, it will be removed. If not, the original text will be returned unchanged. Nice!
REGEXREPLACE is only available in Excel 365 for now.