HomeXML Page 6 - XML Parsing With SAX and Xerces (part 1)
Diving Deeper - XML
So you've already seen how Perl and PHP handle XML data. But you're a Real Programmer, and Real Programmers don't waste time with scripting languages. Nope, you need something a little more powerful, something with more horsepower under the hood. Something written in Java. Something like Xerces.
This next example goes beyond the simple applications you've just seen to provide a more comprehensive XML parsing and processing demonstration. Here's the XML file I plan to use:
<?xml version="1.0"?>
<inventory>
<item>
<id>758</id>
<name>Rusty,
jagged nails for nailgun</name>
<supplier>NailBarn, Inc.</supplier>
<cost
currency="USD">2.99</cost>
<quantity alert="500">10000</quantity>
</item>
<item>
<id>6273</id>
<name>Power
pack for death ray</name>
<supplier>QuakePower.domain.com</supplier>
<cost
currency="USD">9.99</cost>
<quantity alert="20">10</quantity>
</item>
</inventory>
Now, how about parsing this XML file and displaying a breakup of the data contained
within it? With SAX, it's a snap!
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import
java.io.*;
public class MyThirdSaxApp extends DefaultHandler {
// constructor
public
MyThirdSaxApp (String xmlFile){
// create a Xerces SAX parser
SAXParser
parser = new SAXParser();
// set the content handler
parser.setContentHandler(this);
//
parse the document
try{
parser.parse(xmlFile);
} catch (SAXException e)
{
System.err.println (e);
} catch (IOException e) {
System.err.println
(e);
}
}
// callback definitions start here
// call this at document start
public
void startDocument() {
System.out.println ("Document begins");
}
// call
this when start tag found
public void startElement (String uri, String local,
String qName,
Attributes atts){
System.out.println ("Element begins: "" + local
+ """);
String AttributeName,AttributeType,AttributeValue = "";
for (int i
= 0; i < atts.getLength(); i++) {
AttributeName = atts.getLocalName(i);
AttributeType
= atts.getType(AttributeName);
AttributeValue = atts.getValue(AttributeName);
System.out.println
("Attribute: "" + AttributeName + """);
System.out.println ("tType: "" + AttributeType
+ """);
System.out.println ("tValue: "" + AttributeValue + """);
}
}
//
call this when CDATA found
public void characters(char[] text, int start, int
length){
String Content = new String(text, start, length);
if (!Content.trim().equals("")){
System.out.println("Character
data: "" + Content + """);
}
}
// call this when end tag found
public void
endElement (String uri, String local, String qName){
System.out.println("Element
ends: "" + local + """);
}
// call this at document end
public void endDocument(){
System.out.println
("Document ends");
}
// the main method
public static void main (String[]
args) {
MyThirdSaxApp myThirdExample = new MyThirdSaxApp(args[0]);
}
}
Here's the output:
Document begins
Element begins: "inventory"
Element begins: "item"
Element begins:
"id"
Character data: "758"
Element ends: "id"
Element begins: "name"
Character
data: "Rusty, jagged nails for nailgun"
Element ends: "name"
Element begins: "supplier"
Character
data: "NailBarn, Inc."
Element ends: "supplier"
Element begins: "cost"
Attribute:
"currency"
Type: "CDATA"
Value: "USD"
Character data: "2.99"
Element
ends: "cost"
Element begins: "quantity"
Attribute: "alert"
Type: "CDATA"
Value: "500"
Character data: "10000"
Element ends: "quantity"
Element ends:
"item"
Element begins: "item"
Element begins: "id"
Character data: "6273"
Element
ends: "id"
Element begins: "name"
Character data: "Power pack for death ray"
Element
ends: "name"
Element begins: "supplier"
Character data: "QuakePower.domain.com"
Element
ends: "supplier"
Element begins: "cost"
Attribute: "currency"
Type: "CDATA"
Value: "USD"
Character data: "9.99"
Element ends: "cost"
Element begins:
"quantity"
Attribute: "alert"
Type: "CDATA"
Value: "20"
Character
data: "10"
Element ends: "quantity"
Element ends: "item"
Element ends: "inventory"
Document
ends
Most of this should be familiar to you by now, so I'm going to concentrate on
the callback functions used in the example above:
First up, the startDocument() callback, invoked when the parser encounters the beginning of an XML document. Here, the function merely prints a string indicating the start of the document; you could also use it to print a header, or initialize document-specific variables.
// call this at document start
public void startDocument() {
System.out.println
("Document begins");
}
Next, it's the turn of the startElement() callback, discussed in detail a few
pages back...although this one adds a new wrinkle by also accounting for element attributes.
Note that attributes attached to the element are automatically passed to the
startElement() callback as an array. Detailed information on each attribute in this array can be obtained via the functions getName(), getType() and getValue().
The characters() callback handles character data, and receives the CDATA string as argument:
public void characters(char[] text, int start, int length) {
String Content
= new String(text, start, length);
if (!Content.trim().equals("")) {
System.out.println("Character
data: "" + Content + """);
}
}
Sadly, this information is passed as an array of individual characters, rather
than a single string. This means lots of extra processing to get the information into a usable format - which accounts for much of the code above.
It's important to note that the parser will also invoke the characters() callback when it encounters whitespace within the XML document. As you might imagine, this can lead to strange results, especially if you're new to XML programming. I've used the trim() string function to spare myself the agony - you should do the same.
The endElement()callback is invoked when the parser hits the end of an element - note that this callback receives the ending element name as argument.
public void endElement (String uri, String local, String qName){
System.out.println("Element
ends: "" + local + """);
}
Finally, the endDocument() callback is triggered when the end of the document
is reached.
public void endElement (String uri, String local, String qName){
System.out.println("Element
ends: "" + local + """);
}
All these callbacks, acting in concert, result in the output described a few
paragraphs back.
Obviously, this is just one illustration of the applications of the Xerces SAX parser. You can do a lot more with it...and in the second part of this article, I'll build on everything you just learnt to demonstrate how the Xerces SAX parser can be combined with JSP to format XML documents for a Web browser. I'll also take a look at the error-handling functions built into the parser, demonstrating how they can be used to trap and catch errors in XML processing. Make sure you come back for that one!
Note: All examples in this article have been tested with JDK 1.3.0, Apache 1.3.11, mod_jk 1.1.0, Xerces 1.4.4 and Tomcat 3.3. Examples are illustrative only, and are not meant for a production environment. YMMV!