XPath Basics - Dog Days (
Page 2 of 9 )
If you've worked with XML data, you already know
that the XML specification defines certain rules that a document must adhere to
in order to be well-formed. One of the most important rules is that every XML
document must have a single outermost element, called the "root element" which,
in turn, may contain other elements, nested in a hierarchical manner.
Now, it seems logical to assume that if an XML document is laid out in this structured,
hierarchical tree, it's possible to move at will from any node on the tree to
any other node on the tree. And that's where XPath comes in - it provides a standard
addressing mechanism for an XML document which lays bare every element, attribute
and text node on the tree, making it a snap to access and manipulate them.
In fact, XPath gets its name from the fact that node addresses look a lot like
standard *NIX or Windows paths - a hierarchical list of all the branches between
the current node and the tree root, separated by slashes.
XPath represents an XML document as a tree containing a number of different node
types - seven of them, actually. In order to illustrate this, consider the following
XML document:
<?xml version="1.0"?>
<movie>
<title>X-Men</title>
<!-- in case
you didn't know, this is based on the comic - Ed -->
<cast>Hugh Jackman, Patrick
Stewart and Ian McKellen</cast>
<director>Bryan Singer</director>
<year>2000</year>
<?play_trailer?>
</movie>
Here's how XPath would represent this:
[root]
| -- movie
| -- title
| -- X-Men
| --
in case you didn't know...
| -- cast
| -- Hugh Jackman, Patrick
Stewart and Ian McKellen
| -- director
| -- Bryan Singer
| -- year
| -- 2000
| -- play_trailer
As you can see, the various nodes in the tree above are not identical - some
are elements, some contain text fragments, and some simply represent comments.
Since XML itself supports a limited number of constructs, the XPath specification
is able to categorize these different types of nodes into:
Element nodes: Elements within the XML document are represented as element nodes
in the XPath data model. Since elements can have other elements nested within
them, they typically appear as branches on the tree (although so-called "empty"
elements would appear as leaves.) In the example above, "title" would be an element
node.
Text nodes: The character sequences that are enclosed within elements constitute
text nodes on the XPath tree. If a text node contains an entity reference, it
is automatically expanded to its full value. In the example above, "X-Men" would
be a text node.
Attribute nodes: If an element has attributes, those attributes are also represented
as nodes; however, since attributes are always linked to elements, they appear
as children of the corresponding element node.
Namespace nodes: If an XML document defines one or more namespaces for the elements
within it, these namespace declarations are represented as separate nodes by XPath.
Like attributes, namespace nodes appear as children of the associated element
node in the XPath tree - you can see this from the diagram above.
Processing instruction (PI) nodes: If a document contains a processing instruction
- well, that's a separate node too. Note, however, that although the XML declaration
at the top of the document is a PI, there exists no node corresponding to it.
Comment nodes: You figure this one out...
Now, in addition to these six types (which, if you look at your leather-bound
copy of the XML specification, correspond rather closely with the six basic constructs
available in XML), XPath also defines something called a "root node", which is
unique to every XML document. This root node represents the base of the XML document
tree, and encloses everything within it. There can be only one root node in an
XML document, and all other elements within the document exist as children of
this root node.
It should be noted that the root node of a document is not the same as the outermost
element (sometimes referred to as the "document element"); rather, as the representation
above describes, it is a hypothetical node which exists as the parent of the outermost
element
The hierarchical nature of XML data itself imposes a couple of other rules, which
you might think of as pretty obvious - however, they bear repeating in this context.
Every node (other than the root node) has a single parent. Every node (including
the root node) may have one or more children. And every dog has his day.
This article copyright Melonfire 2001. All rights reserved.