Integrating XML with J2EE - Parsing XML Using SAX (
Page 10 of 14 )
To parse an XML document, you instantiate a
javax.xml.parsers.SAXParseFactory object to obtain a SAX-based parser.
This parser is then used to read the XML document a character at a time. (In the
following code fragment the document is obtained from a command-line argument.)
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new XMLParse();
saxParser.parse( new File(argv[0]), handler );
Your SAX parser class must extend the public class org.xml.sax.helpers.
DefaultHandler. This class defines stub methods that receive notification
(callbacks) when XML entities are parsed. By default, these methods do nothing,
but they can be overridden to do anything you like. For example, a method called
startElement() is invoked when the start tag for an element is
recognized. This method receives the element's name and its attributes. The
element's name can be passed in any one of the first three parameters to
startElement(), see Table 16.6, depending on whether namespaces are
being used.
Table 16.6 Parameters to the startElement() Method
|
Parameter |
Contents |
|
uri |
The namespace URI or the empty string if the element has no
namespace URI or if namespace processing is not being
performed. |
|
localName |
The element name (without namespace prefix) will be a non-empty
string when namespaces processing is being performed. |
|
qualifiedName |
The element name with namespace prefix. |
|
attributes |
The element's
attributes. |
In the following code example, handling for the qualified name is
provided.
public void startElement(String uri, String localName,
String qualifiedName, Attributes attributes)
throws SAXException {
System.out.println ("START ELEMENT " + qualifiedName);
for (int i = 0; i< attributes.getLength(); i++) {
System.out.println ("ATTRIBUTE " +
attributes.getQName(i) + " = " + attributes.getValue(i));
}
}
This example prints out a statement indicating that a start tag has been
parsed followed by a list of the attribute names and values.
A similar endElement() method is invoked when an end tag is
encountered.
public void endElement(String uri, String localName, String qualifiedName)
throws SAXException {
System.out.println ("END ELEMENT " + qualifiedName);
}
The full parser is shown in Listing 16.9, but not all of the XML components
will be handled. The default action for a parser is for all components to be
ignored; only the methods that are overridden in the DefaultHandler
subclass will be process XML components. For a complete list of the other
DefaultHandler methods, see Table 16.7 or refer to the J2SDK, v 1.4 API
Specification.
Listing 16.9 Simple SAX Parser
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.*;
public class XMLParse extends DefaultHandler {
public static void main(String argv[]) {
if (argv.length != 1) {
System.err.println("Usage: XMLParse filename");
System.exit(1);
}
DefaultHandler handler = new XMLParse();
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = factory.newSAXParser();
saxParser.parse( new File(argv[0]), handler );
}
catch (ParserConfigurationException ex) {
System.err.println ("Failed to create SAX parser:" + ex);
}
catch (SAXException ex) {
System.err.println ("SAX parser exceeption:" + ex);
}
catch (IOException ex) {
System.err.println ("IO exeception:" + ex);
}
catch (IllegalArgumentException ex) {
System.err.println ("Invalid file argument" + ex);
}
}
public void startDocument() throws SAXException {
System.out.println ("START DOCUMENT");
}
public void endDocument() throws SAXException {
System.out.println ("END DOCUMENT");
}
public void startElement(String uri, String localName,
String qualifiedName, Attributes attributes) throws SAXException {
System.out.println ("START ELEMENT " + qualifiedName);
for (int i = 0; i< attributes.getLength(); i++) {
System.out.println ("ATTRIBUTE " +
attributes.getQName(i) + " = " + attributes.getValue(i));
}
}
public void endElement(String uri, String localName, String qualifiedName)
throws SAXException {
System.out.println ("END ELEMENT " + qualifiedName);
}
public void characters(char[] ch, int start, int length)
throws SAXException {
if (length > 0) {
String buf = new String (ch, start, length);
System.out.println ("CONTENT " + buf);
}
}
}
The parser first checks for the XML document, the name of which is provided
on the command line. After instantiating the SAXParserFactory and
constructing the handler, the XML file is parsed—that is all there is to it.
This parser reports the occurrence of the start and end of the document—the
start and end of elements and the characters that form the element bodies
only.
If an entity method is not declared in your parser, the entity is handled by
the superclass DefaultHandler methods, the default action being to do
nothing. Table 16.7 gives a full list of the callback DefaultHandler
methods that can be implemented.
Table 16.7 SAX DefaultHandler Methods
|
Method |
Receives Notification of |
|
characters(char[] ch, int start, int
length) |
Character data inside an element. |
|
startDocument() |
Beginning of the document. |
|
endDocument() |
End of the document. |
|
startElement(String uri, String localName, String qName,
Attributes attributes) |
Start of an element. |
|
endElement(String uri, String localName,
qName) |
End of an element. |
|
startPrefixMapping (String prefix, String
uri) |
Start of a namespace mapping. |
|
endPrefixMapping (String
prefix) |
End of a namespace mapping. |
|
error(SAXParseException e) |
A recoverable parser error. |
|
FatalError (SAXParseException
e) |
A fatal XML parsing error. |
|
Warning (SAXParseException e) |
Parser warning. |
|
IgnorableWhitespace (char[] ch, int start, int
length) |
Whitespace in the element contents. |
|
notationDecl(String name, String publicId, String
systemId) |
Notation declaration. |
|
processingInstruction (String target, String
data) |
A processing instruction. |
|
resolveEntity(String publicId, String
systemId) |
An external entity. |
|
skippedEntity(String name) |
-A skipped entity. Processors may skip entities if they have not
seen the declarations. (For example, the entity was declared in an external
DTD.) |
As this code does not use any J2EE components, you can simply compile and run
it from the command line. From the Day16/examples directory run the
command:
> java –classpath classes XMLParse XML/jobSummary.xml
Or use the supplied asant build files and enter:
> asant XMLParse
Provide the filename XML/jobSummary.xml when prompted:
The output in Figure 16.1 is produced when this SAX parser is used on the
jobSummary XML in Listing 16.4.

Figure 16.1 -- SAX parser output.
As you can see, the output is not very beautiful. You might like to improve
it by adding indentation to the elements or even getting the output to look like
the original XML.
In addition to making this parser more robust, the following functionality
could be added:
-
Scan element contents for the special characters, such shown in a table, and
replacing them with the symbolic strings as appropriate
-
Improve the handling of fatal parse errors (SAXParseException) with
appropriate error messages giving error line numbers
-
Use the DefaultHandler error() and warning()
methods to handle non-fatal parse errors
-
Configure the parser to be namespace aware with
javax.xml.parsers.SAXParserFactory.setNamespaceAware(true), so that you
can detect tags from multiple sources
Having seen a simple SAX parser, you will now build a parser application that
uses the DOM API.