Explanation
The FILTERXML function can parse XML using XPath expressions. XML is a special text format designed to transport data, with features that allow it to be easily parsed and verified by software. XPath is a query language for selecting the elements and attributes in an XML document. The FILTERXML function uses XPath to match and extract data from text in XML format.
In the example shown cell B5 contains XML data that describes 10 music albums. For each album, there is information about the title, the artist, and the year. To parse this XML, the FILTERXML function is used 3 times in cells D5, E5, and F5 are as follows:
=FILTERXML(B5,"//album/title") // get title
=FILTERXML(B5,"//album/artist") // get artist
=FILTERXML(B5,"//album/year") // get year
In each case, the XPath expression targets a specific element in the XML. For example, in cell D5, the XPath targets the title element with this string:
"//album/title"
With this XPath expression, FILTERXML returns all 10 album titles. Because this example has been created in Excel 365 , which supports dynamic arrays , the results spill into the range D5:D14 automatically.
Explanation
This formula depends on two helper columns. The first helper column holds random values created with the RAND() function. The formula in C5, copied down is:
=RAND()
The RAND function generates a random value at each row.
Note: RAND is a volatile function and will generate new values with each worksheet change.
The second helper column holds the numbers used to sort data, generated with a formula. The formula in D5 is:
=RANK(C5,rand)+COUNTIF($C$5:C5,C5)-1
See this page for an explanation of this formula.
The formula in E5 is:
=INDEX(names,MATCH(ROWS($D$5:$D5),sort,0))
Here, the INDEX function is used to retrieve values in the named range “names”, using the sort values in the named range “sort”. The actual work of figuring out what value to retrieve is done my the MATCH function in this snippet:
MATCH(ROWS($D$5:$D5),sort,0)
Inside MATCH, the ROWS function is given an expanding range as the lookup value, which begins as one cell, and expands as the formula is copied down the column. This increments the lookup value, starting at 1 and continuing to 7. MATCH then returns the position of the lookup value in the list.
The position is fed to INDEX as the row number, and INDEX retrieves the name at that position.