XML Parsing With DOM and Xerces (part 1) - Nailguns, Going Cheap
(Page 3 of 5 )
I'll begin with something simple. Consider the following XML file, an XML-encoded inventory statement for a business selling equipment to Quake enthusiasts.
<?xml version="1.0"?>
<inventory>
<item>
<id>758</id>
<name>Rusty, jagged nails for nailgun</name>
<supplier>NailBarn,
Inc.</supplier>
<cost>2.99</cost>
<quantity>10000</quantity>
</item>
<item>
<id>6273</id>
<name>Power
pack for death ray</name>
<supplier>QuakePower.domain.com</supplier>
<cost>9.99</cost>
<quantity>10</quantity>
</item>
</inventory>
The Xerces DOM parser is designed to read an XML file, build a tree to represent
the structures found within it, and expose object methods and properties to manipulate them. This next example demonstrates how, building a simple Java application that initializes the parser and reads the XML file.
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
import java.io.*;
public
class MyFirstDomApp {
// constructor
public MyFirstDomApp (String
xmlFile) {
// create a DOM parser
DOMParser parser = new DOMParser();
// parse the document
try {
parser.parse(xmlFile);
Document document = parser.getDocument();
NodeDetails(document);
} catch (IOException e) {
System.err.println
(e);
}
}
// this function prints out information on
a specific node
// in this example, the "#document" node
// it then goes
to the next node
// and does the same for that
private void NodeDetails
(Node node) {
System.out.println ("Node Type:" + node.getNodeType()
+ "nNode
Name:" + node.getNodeName());
if(node.hasChildNodes()) {
System.out.println
("Child Node Type:" + node.getFirstChild().getNodeType()
+ "nNode Name:" + node.getFirstChild().getNodeName());
}
}
// the main method to create an instance of our DOM application
public static void main (String[] args) {
MyFirstDomApp MyFirstDomApp
= new MyFirstDomApp (args[0]);
}
}
I'll explain what all this gobbledygook means shortly - but first, let's compile
and run the code.
$ javac MyFirstDomApp.java
Assuming that all goes well, you should now have a class file named "MyFirstDomApp.class".
Copy this class file to your Java CLASSPATH, and then execute it, with the name of the XML file as argument.
$ java MyFirstDomApp /home/me/dom/inventory.xml
Here's what the output looks like:
Node Type:9
Node Name:#document
Child Node Type:1
Node Name:inventory
Now, this might not look like much, but it demonstrates the basic concept of
the DOM, and builds the foundation for more complex code. Let's look at the code in detail:
1. The first step is to import all the classes required to execute the application. First come the classes for the Xerces DOM parser, followed by the classes for exception handling and file I/O.
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
import java.io.*;
2. Next, a constructor is defined for the class (in case you didn't already know,
a constructor is a method that is invoked automatically when you create an instance of the class).
// constructor
public MyFirstDomApp (String xmlFile) {
// create a DOM
parser
DOMParser parser = new DOMParser();
// parse the document
try {
parser.parse(xmlFile);
Document document = parser.getDocument();
NodeDetails(document);
} catch (IOException e) {
System.err.println
(e);
}
}
As you can see, the constructor uses the parse() method to perform the actual
parsing of the XML document; it accepts the XML file name as method argument. This method call is enclosed within a "try-catch" error handling block, in order to gracefully recover from errors.
The end result of this parsing is a DOM tree consisting of a single root and its child nodes, each of which exposes methods that describe the object in greater detail.
3. The getDocument() method returns an object representing the entire XML document; this object reference is then passed on to the NodeDetails() method to display information about itself, and its children.
// this function prints out information on a specific node
// in this example,
the "#document" node
// it then goes to the next node
// and does the same for
that
private void NodeDetails (Node node) {
System.out.println ("Node Type:"
+ node.getNodeType() + "nNode Name:" +
node.getNodeName());
if(node.hasChildNodes())
{
System.out.println ("Child Node Type:" +
node.getFirstChild().getNodeType()
+ "nNode Name:" +
node.getFirstChild().getNodeName());
}
}
4. Once a reference to a node has been obtained, a number of other methods and
properties become available to obtain the name and value of that node, as well as references to parent and child nodes. In the code snippet above, I've used the getNodeType() and getNodeName() methods of the Node object to obtain the node type and name respectively. Similarly, the hasChildNodes() method can be used to find out if a node has child nodes under it, while the getFirstChild() method can be used to get a reference to the first child node.
In case you're wondering about the getNodeType() method - every node is of a specific type, and this method returns a numeric and string constant corresponding to the node type. Here's the list of available types:
Type Type Description Name (num) (str)
---------------------------------------------------------------------------
1 ELEMENT_NODE Element The element name
2 ATTRIBUTE_NODE Attribute The attribute name
3 TEXT_NODE Text #text
4 CDATA_SECTION_NODE CDATA #cdata-section
5 ENTITY_REFERENCE_NODE Entity reference The entity reference name
6 ENTITY_NODE Entity The entity name
7 PROCESSING_INSTRUCTION_NODE PI The PI target
8 COMMENT_NODE Comment #comment
9 DOCUMENT_NODE Document #document
10 DOCUMENT_TYPE_NODE DocType Root element
11 DOCUMENT_FRAGMENT_NODE DocumentFragment #document-fragment
12 NOTATION_NODE Notation The notation name
Next: Delving Deeper >>
More XML Articles
More By icarus, (c) Melonfire