With a sound foundation in XML theory behind you, it's now time to address the other half of the jigsaw - actually converting all that marked-up data into something useful. This first article in a two-part series examines the need, rationale and basic concepts of XSLT, the Extensible Stylesheet Language for Transformations, with sample code and examples.
The preceding example used a single template rule to generate the entire document. While this technique is adequate for simple documents, it becomes difficult to maintain such a stylesheet for a long and complex source tree. Hence XSLT also supports multiple template rules within the same stylesheet, with instructions which tell the processor to recursively traverse the tree for child elements wherever necessary.
I'll illustrate this with a somewhat more complex XML document:
<?xml version="1.0"?>
<review id="57" category="2">
<title>Moulin
Rouge</title>
<cast>
<person>Nicole Kidman</person>
<person>Ewan
McGregor</person>
<person>John Leguizamo</person>
<person>Jim
Broadbent</person>
<person>Richard Roxburgh</person>
</cast>
<director>Baz
Luhrmann</director>
<duration>120</duration>
<genre>Romance/Comedy</genre>
<year>2001</year>
<body>
A
stylishly spectacular extravaganza, <title>Moulin Rouge</title> is hard
to
categorize;
it is, at different times, a love story, a costume drama, a musical,
and a
comedy. Director <person>Baz Luhrmann</person> (well-known for the
very hip
<title>William
Shakespeare's Romeo + Juliet</title>) has taken
some simple themes - love,
jealousy and obsession - and done something completely
new and different
with them by setting them to music.
</body>
<rating>5</rating>
<teaser>Baz
Luhrmann's over-the-top vision of Paris at the turn of the
century is witty, sexy...and
completely unforgettable</teaser>
</review>
Now, I'd like to present this raw data as the following HTML page:
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<basefont
face="Arial" size="2">
</head>
<body>
<b>Moulin Rouge</b> (2001)
<br>Baz
Luhrmann's over-the-top vision of Paris at the turn of the century
is witty, sexy...and
completely unforgettable<p></p>
<b>Cast: </b>
Nicole Kidman, Ewan
McGregor, John Leguizamo, Jim Broadbent and Richard
Roxburgh
<br><b>Director:
</b>Baz Luhrmann
<br><b>Duration: </b>120 minutes
<br><b>Our
rating: </b>5
<p>
A stylishly spectacular extravaganza, <i>Moulin Rouge</i>
is hard to
categorize; it is, at different times, a love story, a costume drama,
a
musical, and a comedy. Director <b>Baz Luhrmann</b> (well-known for the
very
hip <i>William Shakespeare's Romeo + Juliet</i>) has taken some simple
themes
- love, jealousy and obsession - and done something completely new
and different
with them by setting them to music. </p>
</body>
</html>
Let's dissect this a little. Everything you've just seen stems from the first stylesheet rule, which sets up the order in which I want the various bits of information to appear.
This template rule looks for the element "review" - the outermost element - in the source tree - and replaces it with the standard HTML headers in the corresponding result tree. Within the body of the result document, it inserts placeholders for the different pieces of information; each of these placeholders is actually a reference to another template rule.
The
<xsl:apply-templates />
instruction tells the XSLT processor to process all the children of the current node. In case this is too all-inclusive for you, you can refine the list of children to process with an additional "select" attribute, as I've done in the example above.
While processing the child nodes, the XSLT processor will look for matching templates and apply them wherever possible. In the example above, when the processor receives the instruction
<xsl:apply-templates select="teaser"/>
it attempts to find and process template rules matching the element "teaser".
In this case, the template rule simply tells the processor to act on its children as per other templates which may exist within the stylesheet (strictly speaking, this rule is not really necessary)...and here they are:
Now, if you take a look at the XML document, you'll notice that the "title" element occurs in two places - once as the title of the review, and multiple times within the body (to identify movie references). Obviously, I'd like to treat these two occurrences differently - the former should be highlighted as the review title, while the latter should be italicized within the body.
I'll have XSLT base its decision on the context in which it finds the "title" element - consider these two rules, which do exactly what I need:
Remember what I told you about conflict resolution? Here's an example - the XSLT processor will never apply the first rule to "title" elements within the body, because there already exists a more specific rule to override it.
they're simply there to create a grammatically correct string of cast members, and make sure that the commas are all in the right places. The position() and last() functions are XPath functions, and come in very handy if you need to identify the position of any node in a collection.
By using multiple template rules to control the markup of different elements in the result tree, XSLT makes it possible to break up a complex stylesheet into smaller chunks and thereby easily handle long and convoluted XML data. Breaking up a stylesheet in this manner also makes it simpler to add (or remove) individual template rules for specific elements or sets of elements. The template rules individually create fragments of the result tree; these fragments are then combined into a composite result tree.