Integrating XML with J2EE - Document Object Model (DOM) Parser (
Page 11 of 14 )
When you use the DOM API to parse an XML document, a tree structure
representing the XML document is built in memory. You can then analyze the nodes
of the tree to discover the XML contents.
Building a DOM Tree
The mechanism for instantiating a DOM parser is very similar to that for a
SAX parser. A new instance of a DocumentBuilderFactory is obtained that
is used to create a new DocumentBuilder.
The parse() method is called on this DocumentBuilder object
to return an object that conforms to the public Document interface.
This object represents the XML document tree. The following code fragment
creates a DOM parser and reads the XML document from a file supplied as a
command-line argument:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(argv[0]));
With the DocumentBuilder.parse() method, you are not restricted to
reading XML only from a file; you can also use a constructed
InputStream or read from a source defined by a URL.
The Document obtained form the parse() method is a subclass of
org.w3c.dom.Node. To simplify processing of the DOM tree, all of the
objects in the tree are either Node objects or objects of a sub class
of Node.
There are a number of methods provided in the Document interface to
access the nodes in the tree. These are listed in Table 16.8.
The normalize() method should always be used to put all text nodes
into a form where there are no adjacent text nodes or empty text nodes. In this
form, the DOM view better reflects the XML structure.
After parsing an XML document the DOM parser has built an in-memory
representation of the document that will look something like Figure 16.2.
The root of the DOM tree is obtained with the getDocumentElement()
method.
Element root = document.getDocumentElement();

Figure 16.2 -- Diagram of the DOM tree.
This method returns an Element, which is simply a subclass of
Node that may have attributes associated with it. An element can be the
parent of other elements.
There are a number of methods provided in the Document interface to
access the nodes in the tree, some of which are listed in Table 16.8. These
methods return either a Node or a NodeList (ordered collection
of nodes).
Table 16.8 Document Interface Methods to Traverse a DOM Tree
|
Method Name |
Description |
|
getDocumentElement() |
Allows direct access to the root element of the
document |
|
getElementsByTagName(String) |
Returns a NodeList of all the elements with the given
tag name in the order in which they are encountered in the
tree |
|
getChildNodes() |
A NodeList that contains all children of this
node |
|
getParentNode() |
The parent of this node |
|
getFirstChild() |
The first child of this node |
|
getLastChild() |
The last child of this node |
|
getPreviousSibling() |
The node immediately preceding this
node |
In a simple DOM application the getChildNodes() method can be used
to recursively traverse the DOM tree. The NodeList.getLength() method
can then be used to find out the number of nodes in the NodeList.
NodeList children = node.getChildNodes();
int len = (children != null) ? children.getLength() : 0;
In addition to the tree traversal methods, the Node interface
provides the following methods (among others) to investigate the contents of a
node as in Table 16.9.
Table 16.9 Document Interface Methods to Inspect DOM Nodes
|
Method Name |
Description |
|
getAttributes() |
A NamedNodeMap containing the attributes of a node if
it is an Element or null if it is not. |
|
getNodeName() |
A string representing the name of this node (the tag).
|
|
getNodeType() |
A code representing the type of the underlying object. A node
can be one of ELEMENT_NODE, ATTRIBUTE_NODE,
TEXT_NODE, CDATA_SECTION_NODE, ENTITY_REFERENCE_NODE,
ENTITY_NODE, PROCESSING_INSTRUCTION_NODE,
COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE,
DOCUMENT_FRAGMENT_NODE, NOTATION_NODE. |
|
getNodeValue() |
A string representing the value of this node. If the node is a
text node, the value will be the contents of the text node; for an attribute
node, it will be the string assigned to the attribute. For most node types,
there is no value and a call to this method will return
null. |
|
getNamespaceURI() |
The namespace URI of this node. |
|
hasAttributes() |
Returns a boolean to indicate whether this node has any
attributes. |
|
hasChildNodes() |
Returns a boolean to indicate whether this node has any
children. |
|
This chapter is from Teach Yourself
J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams,
2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy
this book now.
|