One of the fundamental constructs for XSL transformations and XML links, XPath is nonetheless one of the lesser lights of the XML universe. However, if you're serious about developing your XML skills, you need to know it inside out - and this tutorial has all you need to get started.
If you've worked with XML data, you already know that the XML specification defines certain rules that a document must adhere to in order to be well-formed. One of the most important rules is that every XML document must have a single outermost element, called the "root element" which, in turn, may contain other elements, nested in a hierarchical manner.
Now, it seems logical to assume that if an XML document is laid out in this structured, hierarchical tree, it's possible to move at will from any node on the tree to any other node on the tree. And that's where XPath comes in - it provides a standard addressing mechanism for an XML document which lays bare every element, attribute and text node on the tree, making it a snap to access and manipulate them.
In fact, XPath gets its name from the fact that node addresses look a lot like standard *NIX or Windows paths - a hierarchical list of all the branches between the current node and the tree root, separated by slashes.
XPath represents an XML document as a tree containing a number of different node types - seven of them, actually. In order to illustrate this, consider the following XML document:
<!-- in case
you didn't know, this is based on the comic - Ed -->
<cast>Hugh Jackman, Patrick
Stewart and Ian McKellen</cast>
Here's how XPath would represent this:
| -- movie
| -- title
| -- X-Men
in case you didn't know...
| -- cast
| -- Hugh Jackman, Patrick
Stewart and Ian McKellen
| -- director
| -- Bryan Singer
| -- year
| -- 2000
| -- play_trailer
As you can see, the various nodes in the tree above are not identical - some are elements, some contain text fragments, and some simply represent comments. Since XML itself supports a limited number of constructs, the XPath specification is able to categorize these different types of nodes into:
Element nodes: Elements within the XML document are represented as element nodes in the XPath data model. Since elements can have other elements nested within them, they typically appear as branches on the tree (although so-called "empty" elements would appear as leaves.) In the example above, "title" would be an element node.
Text nodes: The character sequences that are enclosed within elements constitute text nodes on the XPath tree. If a text node contains an entity reference, it is automatically expanded to its full value. In the example above, "X-Men" would be a text node.
Attribute nodes: If an element has attributes, those attributes are also represented as nodes; however, since attributes are always linked to elements, they appear as children of the corresponding element node.
Namespace nodes: If an XML document defines one or more namespaces for the elements within it, these namespace declarations are represented as separate nodes by XPath. Like attributes, namespace nodes appear as children of the associated element node in the XPath tree - you can see this from the diagram above.
Processing instruction (PI) nodes: If a document contains a processing instruction - well, that's a separate node too. Note, however, that although the XML declaration at the top of the document is a PI, there exists no node corresponding to it.
Comment nodes: You figure this one out...
Now, in addition to these six types (which, if you look at your leather-bound copy of the XML specification, correspond rather closely with the six basic constructs available in XML), XPath also defines something called a "root node", which is unique to every XML document. This root node represents the base of the XML document tree, and encloses everything within it. There can be only one root node in an XML document, and all other elements within the document exist as children of this root node.
It should be noted that the root node of a document is not the same as the outermost element (sometimes referred to as the "document element"); rather, as the representation above describes, it is a hypothetical node which exists as the parent of the outermost element
The hierarchical nature of XML data itself imposes a couple of other rules, which you might think of as pretty obvious - however, they bear repeating in this context. Every node (other than the root node) has a single parent. Every node (including the root node) may have one or more children. And every dog has his day.
This article copyright Melonfire 2001. All rights reserved.