In this article, Martin Bond discusses XML and its associated APIs and standards, and how XML can be used to create flexible structured data that is inherently portable. This excerpt is from chapter (Day) 16 of Teach Yourself J2EE in 21 Days, second edition, by Martin Bond, et. al. (Sams, ISBN: 0672325586)
When you use the DOM API to parse an XML document, a tree structure representing the XML document is built in memory. You can then analyze the nodes of the tree to discover the XML contents.
Building a DOM Tree
The mechanism for instantiating a DOM parser is very similar to that for a SAX parser. A new instance of a DocumentBuilderFactory is obtained that is used to create a new DocumentBuilder.
The parse() method is called on this DocumentBuilder object to return an object that conforms to the public Document interface. This object represents the XML document tree. The following code fragment creates a DOM parser and reads the XML document from a file supplied as a command-line argument:
With the DocumentBuilder.parse() method, you are not restricted to reading XML only from a file; you can also use a constructed InputStream or read from a source defined by a URL.
The Document obtained form the parse() method is a subclass of org.w3c.dom.Node. To simplify processing of the DOM tree, all of the objects in the tree are either Node objects or objects of a sub class of Node.
There are a number of methods provided in the Document interface to access the nodes in the tree. These are listed in Table 16.8.
The normalize() method should always be used to put all text nodes into a form where there are no adjacent text nodes or empty text nodes. In this form, the DOM view better reflects the XML structure.
After parsing an XML document the DOM parser has built an in-memory representation of the document that will look something like Figure 16.2.
The root of the DOM tree is obtained with the getDocumentElement() method.
Element root = document.getDocumentElement();
Figure 16.2 -- Diagram of the DOM tree.
This method returns an Element, which is simply a subclass of Node that may have attributes associated with it. An element can be the parent of other elements.
There are a number of methods provided in the Document interface to access the nodes in the tree, some of which are listed in Table 16.8. These methods return either a Node or a NodeList (ordered collection of nodes).
Table 16.8 Document Interface Methods to Traverse a DOM Tree
Method Name
Description
getDocumentElement()
Allows direct access to the root element of the document
getElementsByTagName(String)
Returns a NodeList of all the elements with the given tag name in the order in which they are encountered in the tree
getChildNodes()
A NodeList that contains all children of this node
getParentNode()
The parent of this node
getFirstChild()
The first child of this node
getLastChild()
The last child of this node
getPreviousSibling()
The node immediately preceding this node
In a simple DOM application the getChildNodes() method can be used to recursively traverse the DOM tree. The NodeList.getLength() method can then be used to find out the number of nodes in the NodeList.
NodeList children = node.getChildNodes();
int len = (children != null) ? children.getLength() : 0;
In addition to the tree traversal methods, the Node interface provides the following methods (among others) to investigate the contents of a node as in Table 16.9.
Table 16.9 Document Interface Methods to Inspect DOM Nodes
Method Name
Description
getAttributes()
A NamedNodeMap containing the attributes of a node if it is an Element or null if it is not.
getNodeName()
A string representing the name of this node (the tag).
getNodeType()
A code representing the type of the underlying object. A node can be one of ELEMENT_NODE, ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_REFERENCE_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DOCUMENT_FRAGMENT_NODE, NOTATION_NODE.
getNodeValue()
A string representing the value of this node. If the node is a text node, the value will be the contents of the text node; for an attribute node, it will be the string assigned to the attribute. For most node types, there is no value and a call to this method will return null.
getNamespaceURI()
The namespace URI of this node.
hasAttributes()
Returns a boolean to indicate whether this node has any attributes.
hasChildNodes()
Returns a boolean to indicate whether this node has any children.
This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.