HomeXML Page 2 - XML Parsing With SAX and Xerces (part 1)
Playing The SAX - XML
So you've already seen how Perl and PHP handle XML data. But you're a Real Programmer, and Real Programmers don't waste time with scripting languages. Nope, you need something a little more powerful, something with more horsepower under the hood. Something written in Java. Something like Xerces.
Now, you may not know this, but there are two basic approaches to parsing an XML document. The first of these approaches is SAX, the Simple API for XML, which works by iterating through an XML document and calling specific functions every time it encounters an XML structure. The parser's responsibility here is limited to simply reading the document and transferring control to the specified functions whenever it hits an XML construct; the functions called are responsible for actually processing the XML construct found, and the information embedded within it.
In case this doesn't sound all that appealing, there's also an alternative approach: construct a tree structure representing the XML data in memory and then traverse the branches of the tree to get to the fruit - the data - hanging on to them. This approach involves using the Document Object Model, and will be discussed in a later segment of this tutorial.
There are a couple of obvious advantages to using a Java-based parser to parse an XML document. First, Java code is compiled into bytecode and stored on the server; this speeds up access time, since the code is only compiled once (the first time it is accessed) with subsequent accesses being much faster than the equivalent CGI or PHP code. Then there's the portability issue, already touched upon in the previous page - Java code is cross-platform, which means that you can write an application once, then move it to any platform for which a Java virtual machine exists, and it will run as expected, with no additional tweaks or modifications required.
The Xerces Java Parser (version 1.4.4 is what I'll be using) supports the latest version of SAX, SAX 2.0, in addition to the earlier SAX 1.0 standard. It also includes support for XML Schema and the DOM Level 2 standard. Note, however, that since XML standards are constantly evolving, using Xerces can sometimes produce unexpected results; take a look at the documentation provided with the parser, and at the information available on its official Web site, for errata and bugs.
With the introductions out of the way, let's put together the tools you'll need to get started with Xerces. Here's a quick list of the software you'll need:
1. The Java Development Kit (JDK), available from the Sun Microsystems Web site (http://java.sun.com)
2. The Apache Web server, available from the Apache Software Foundation's Web site (http://httpd.apache.org)
3. The Tomcat Application Server, available from the Apache Software Foundation's Web site (http://httpd.apache.org)
4. The Xerces parser, available from the Apache XML Project's Web site (http://xml.apache.org)
5. The mod_jk extension for Apache-Tomcat communication, available from the Jakarta Project's Web site (http://httpd.apache.org)