XML Parsing With DOM and Xerces (part 1) (
Page 1 of 5 )
The Simple API for XML (SAX) is just one approach to parsing XML. An alternative
approach is the Document Object Model (DOM), which builds a data tree in memory
for easier, non-sequential access to XML data fragments. In this article, find
out how to combine the Java-based Xerces parser with the DOM to create simple
Java/XML applications.If you're at all familiar with XML programming, you'll be aware that there are
two basic approaches to parsing an XML document. The Simple API for XML (SAX)
is one; it parses an XML document in a sequential manner, generating and throwing
events for the application layer to process as it encounters different XML elements.
This sequential approach enables rapid parsing of XML data, especially in the
case of long or complex XML documents; however, the downside is that a SAX parser
cannot be used to access XML document nodes in a random or non-sequential manner.
Hence the Document Object Model (DOM). This alternative approach involves building
a tree representation of the XML document in memory, and then using built-in methods
to navigate through this tree. Once a particular node has been reached, built-in
properties can be used to obtain the value of the node, and use it within the
script. This tree-based paradigm does away with the problems inherent in SAX's
sequential approach, allowing for immediate random access to any node or collection
of nodes in the tree.
DOM parsers are available for a variety of different platforms - you can get
them for Perl, PHP, Python and C. However, the one that's going to occupy us for
the next ten minutes is the Java-based Xerces parser, which includes a very capable
DOM parser. Over the next few pages, I'm going to be demonstrating how it works,
using some real-world examples to bring home its capabilities and to illustrate
how the combination of Java, XML and JSP can be used to develop XML-based Web
applications.
Let's get started!