XML Parsing With SAX and Xerces (part 1) - Playing The SAX (
Page 2 of 6 )
Now, you may not know
this, but there are two basic approaches to parsing an XML document. The first
of these approaches is SAX, the Simple API for XML, which works by iterating through
an XML document and calling specific functions every time it encounters an XML
structure. The parser's responsibility here is limited to simply reading the document
and transferring control to the specified functions whenever it hits an XML construct;
the functions called are responsible for actually processing the XML construct
found, and the information embedded within it.
In case this doesn't sound all that appealing, there's also an alternative approach:
construct a tree structure representing the XML data in memory and then traverse
the branches of the tree to get to the fruit - the data - hanging on to them.
This approach involves using the Document Object Model, and will be discussed
in a later segment of this tutorial.
There are a couple of obvious advantages to using a Java-based parser to parse
an XML document. First, Java code is compiled into bytecode and stored on the
server; this speeds up access time, since the code is only compiled once (the
first time it is accessed) with subsequent accesses being much faster than the
equivalent CGI or PHP code. Then there's the portability issue, already touched
upon in the previous page - Java code is cross-platform, which means that you
can write an application once, then move it to any platform for which a Java virtual
machine exists, and it will run as expected, with no additional tweaks or modifications
required.
The Xerces Java Parser (version 1.4.4 is what I'll be using) supports the latest
version of SAX, SAX 2.0, in addition to the earlier SAX 1.0 standard. It also
includes support for XML Schema and the DOM Level 2 standard. Note, however, that
since XML standards are constantly evolving, using Xerces can sometimes produce
unexpected results; take a look at the documentation provided with the parser,
and at the information available on its official Web site, for errata and bugs.
With the introductions out of the way, let's put together the tools you'll need
to get started with Xerces. Here's a quick list of the software you'll need:
1. The Java Development Kit (JDK), available from the Sun Microsystems Web site
(
http://java.sun.com)
2. The Apache Web server, available from the Apache Software Foundation's Web
site (
http://httpd.apache.org)
3. The Tomcat Application Server, available from the Apache Software Foundation's
Web site (
http://httpd.apache.org)
4. The Xerces parser, available from the Apache XML Project's Web site (
http://xml.apache.org)
5. The mod_jk extension for Apache-Tomcat communication, available from the Jakarta
Project's Web site (
http://httpd.apache.org)
Installation instructions for all these packages are available in their respective
source archives. In case you get stuck, you might want to look at the Tomcat User
Guide at
http://jakarta.apache.org/tomcat/tomcat-3.3-doc/tomcat-ug.html