HomeXML Page 4 - XML Parsing With DOM and Xerces (part 1)
Delving Deeper - XML
The Simple API for XML (SAX) is just one approach to parsing XML. An alternative approach is the Document Object Model (DOM), which builds a data tree in memory for easier, non-sequential access to XML data fragments. In this article, find out how to combine the Java-based Xerces parser with the DOM to create simple Java/XML applications.
As you must have figured out by now, using the DOM parser is fairly easy - essentially, it involves creating a "tree" of the elements in the XML document, and traversing that tree with built-in methods. In the introductory example, I ventured as far as the document element; in this next one, I'll go much further, demonstrating how the parser's built-in methods can be used to navigate to any point in the document tree.
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
import java.io.*;
public
class MySecondDomApp {
// constructor
public MySecondDomApp (String xmlFile)
{
// create a Xerces DOM parser
DOMParser parser = new DOMParser();
// parse the document and
// access the root node with its children
try {
parser.parse(xmlFile);
Document document
= parser.getDocument();
NodeDetails(document);
} catch
(IOException e) {
System.err.println (e);
}
}
// this function drills deeper into the DOM tree
private void NodeDetails
(Node node) {
// get the node name
System.out.println (node.getNodeName());
// check if the node has children
if(node.hasChildNodes()) {
// get the child nodes, if they exist
NodeList children
= node.getChildNodes();
if (children != null) {
for
(int i=0; i< children.getLength(); i++) {
// repeat the
process for each child node
// get the node name
System.out.println ("t" +
children.item(i).getNodeName());
// check if the node has children
if(children.item(i).hasChildNodes())
{
// get the children, if they exist
NodeList childrenOfchildren =
children.item(i).getChildNodes();
if (childrenOfchildren != null) {
// get the node name
for (int j=0; j< childrenOfchildren.getLength();
j++)
{
System.out.println ("tt" +
childrenOfchildren.item(j).getNodeName());
}
}
}
}
}
}
}
// the main method to create
an instance of our DOM application
public static void main (String[] args)
{
MySecondDomApp MySecondDomApp = new MySecondDomApp (args[0]);
}
}
Here's the output:
#document
inventory
#text
item
#text
item
#text
As demonstrated in the first example, the fundamentals remain unchanged - initialize
the parser, read an XML document, get a reference to the root of the tree and start traversing the tree. Consequently, most of the code here remains the same as that used in the introductory example, with the changes occurring only in the NodeDetails() function. Let's take a closer look at this function:
// this function drills deeper into the DOM tree
private void NodeDetails (Node
node) {
// get the node name
System.out.println (node.getNodeName());
// check if the node has children
if(node.hasChildNodes()) {
//
get the child nodes, if they exist
NodeList children = node.getChildNodes();
if (children != null) {
for (int i=0; i< children.getLength();
i++) {
// repeat the process for each child node
// get the node name
System.out.println ("t" + children.item(i).getNodeName());
// check if the node has children
if(children.item(i).hasChildNodes())
{
// get the children, if they exist
NodeList childrenOfchildren =
children.item(i).getChildNodes();
if (childrenOfchildren != null) {
// get the node
name
for (int j=0; j< childrenOfchildren.getLength();
j++)
{
System.out.println ("tt" +
childrenOfchildren.item(j).getNodeName());
}
}
}
}
}
}
}
Once a reference to the root of the tree has been obtained and passed to NodeDetails(),
the getChildNodes() function is used to obtain a list of the children of that node. This list is returned as a new NodeList object, which comes with its own methods for accessing individual elements of the node list.
As you can see, one of these methods is the getLength() method, used to obtain the number of child nodes, in order to iterate through them. Individual elements of the node list can be accessed via the item() method, which returns a Node object, which puts us back on familiar territory - the Node object's standard getNodeName() and getNodeType() methods can now be used to access detailed information about the node.
The process is then repeated for each of these Node objects - a check for further children, a retrieved NodeList, a loop iterating through the child Nodes - until the end of the document tree is reached.