Get Top Level Domain (Tld)

Explanation

In this example, the goal is to extract the top-level domain (TLD) from a list of domains. A top-level domain is the last segment of text in a domain name, for example, “.com”, “.net”, or “.net”. In the current version of Excel, the TEXTAFTER function is a simple way to solve this problem. In an older version of Excel, you can use a more complicated formula based on several text functions including RIGHT, FIND, LEN, and SUBSTITUTE. Both approaches are explained below.

TEXTAFTER function

The TEXTAFTER function returns the text that occurs after a given delimiter. The generic syntax for TEXTAFTER supports many options:

=TEXTAFTER(text,delimiter,[instance_num],[match_mode],[match_end], [if_not_found])

However, for this problem, we only need to provide the first three arguments:

=TEXTAFTER(text,delimiter,instance_num)

In the worksheet shown, the formula in cell D5 is:

=TEXTAFTER(B5,".",-1)

The TEXTAFTER function is configured with the following inputs:

text - the domain in cell B5
delimiter - a dot (".")
instance_num - given as -1 for the last instance

With the text “https://www.domain.com” in cell B5, TEXTAFTER splits the string at the last “.” and returns “com”, which is the top-level domain. As the formula is copied down, the other top-level domains are returned.

For more on TEXTAFTER, see How to use the TEXTAFTER function .

Legacy Excel

Older versions of Excel do not provide the TEXTAFTER function. However, you can still extract the top-level domain (TLD)with a more complicated formula based on several text functions including RIGHT , FIND , LEN , and SUBSTITUTE :

=RIGHT(B5,LEN(B5)-FIND("*",SUBSTITUTE(B5,".","*",LEN(B5)-LEN(SUBSTITUTE(B5,".","")))))

This is an intimidating formula, complicated by the fact that the text functions in older versions of Excel are quite limited. However, it operates in a series of small steps. At the core, the formula uses the RIGHT function to extract characters starting from the right. All of the other functions in this formula just do one thing: they figure out how many characters (n) need to be extracted:

=RIGHT(B5,n) // n = ??

At a high level, the formula replaces the last dot “.” in the domain with an asterisk (*) and then uses the FIND function to locate the position of the asterisk. Once the position is known, the RIGHT function is used to extract the TLD. How does the formula know to replace only the last dot? This is the clever and complicated part. The key is here:

SUBSTITUTE(B5,".","*",LEN(B5)-LEN(SUBSTITUTE(B5,".","")))

This snippet does the actual replacement of the last dot with an asterisk (*). The trick is that the SUBSTITUTE function has an optional fourth argument that specifies which “instance” of the old_text should be replaced. If no value is supplied for instance_num , SUBSTITUTE will replace all instances of old_text with new_text . However, if an instance_num is provided, SUBSTITUTE will only replace that particular instance of old_text (i.e. if 2 is provided, SUBSTITUTE will replace the second instance). Figuring out which instance to replace is the hardest part of this problem because we have no direct way to count how many dots are in a text string. Instead, we need to take a manual approach based on the LEN function :

LEN(B5)-LEN(SUBSTITUTE(B5,".",""))

Here, we calculate the total number of characters in the domain with LEN, then we subtract the total number of characters with all dots removed with the SUBSTITUTE function. For example, the value in cell B5 is “https://www.domain.com”. The above expression evaluates like this:

=LEN(B5)-LEN(SUBSTITUTE(B5,".",""))
=22-20
=2

The result (2) is the number of dots in the text, which is provided to SUBSTITUTE as instance_num :

SUBSTITUTE(B5,".","*",2)

SUBSTITUTE then replaces only the second dot with “” resulting in the text “https://www.domaincom”. Next, the FIND function locates the asterisk in the text:

FIND("*","https://www.domain*com") // returns 19

The result from FIND is 19, which is subtracted from the total length of the domain:

=LEN(B5)-19
=22-19
=3

The number 3 is returned to the FIND function as num_chars :

=RIGHT(B5,3) // returns "com"

And the final result returned by RIGHT is “com”

Explanation

In this example, the goal is to remove the protocol from a list of URLs. To remove the protocol from a URL, we need to remove the first part of the URL. Protocols typically look like this:

http://
https://
sftp://

Notice that all protocols end with a double slash ("//"). In the current version of Excel, the easiest way to do this is with the TEXTAFTER function. In older versions of Excel, you can use a formula based on the MID and FIND functions. Both options are explained below.

TEXTAFTER function

The TEXTAFTER function returns the text that occurs after a given delimiter. TEXTAFTER supports many options , but for this problem, we only need to provide the first two arguments:

=TEXTAFTER(text,delimiter)

Text : the text string to process.
Delimiter : the place at which to split the text.

To remove all text up to and including the double slash, we can use the TEXTAFTER function like this:

=TEXTAFTER(B5,"//")

As the formula is copied down the table, it extracts the text that occurs after the double slash ("//"). The result is the original URL without the protocol.

Legacy Excel

In an older version of Excel without the TEXTAFTER function, you can remove the protocol from a URL with a formula based on the MID function and the FIND function like this:

=MID(B5,FIND("//",B5)+2,LEN(B5))

The core of this formula is the MID function , which extracts the text in a URL starting with the character after “//”, and ending with the character before the trailing slash ("/"):

=MID(text,start_num,num_chars)

The text is the URL in cell B5. The start_num is calculated using the FIND function like this:

FIND("//",B5)+2

FIND returns the position of the double slash ("//") in the URL as a number. With the text “https://www.domain.com” in cell B5, FIND returns 9. We don’t want to start extracting at character 9 however, we want to skip the double slash ("//") altogether, so we add 2 to the result from FIND which results in 11. This is the value used for start_num. At this point, we have:

=MID(B5,11,LEN(B5))

To provide a value for num_chars , we use the LEN function , which returns a count of all the characters in B5, which is 22. Using the LEN function like this is a shortcut, designed to simplify the formula. LEN will return 22, which is greater than the number of characters that remain. However, when num_chars exceeds the remaining string length, MID will simply extract all remaining characters. Using LEN to provide num_chars is an easy way to give MID a number that is always large enough, without the trouble of calculating exactly how many characters remain. Dropping in the value returned by the LEN function, we now have a formula that looks like this:

=MID(B5,11,22) // returns "www.domain.com"

The MID function begins extracting at character 11 and extracts all remaining text. The final result is “www.domain.com”.

Explanation#

TEXTAFTER function#

Legacy Excel#

Explanation#

TEXTAFTER function#

Legacy Excel#

Explanation

TEXTAFTER function

Legacy Excel

Explanation

TEXTAFTER function

Legacy Excel