Java & J2EE Page 11 - Integrating XML with J2EE |
When you use the DOM API to parse an XML document, a tree structure representing the XML document is built in memory. You can then analyze the nodes of the tree to discover the XML contents. Building a DOM Tree The mechanism for instantiating a DOM parser is very similar to that for a SAX parser. A new instance of a DocumentBuilderFactory is obtained that is used to create a new DocumentBuilder. The parse() method is called on this DocumentBuilder object to return an object that conforms to the public Document interface. This object represents the XML document tree. The following code fragment creates a DOM parser and reads the XML document from a file supplied as a command-line argument: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new File(argv[0])); With the DocumentBuilder.parse() method, you are not restricted to reading XML only from a file; you can also use a constructed InputStream or read from a source defined by a URL. The Document obtained form the parse() method is a subclass of org.w3c.dom.Node. To simplify processing of the DOM tree, all of the objects in the tree are either Node objects or objects of a sub class of Node. There are a number of methods provided in the Document interface to access the nodes in the tree. These are listed in Table 16.8. The normalize() method should always be used to put all text nodes into a form where there are no adjacent text nodes or empty text nodes. In this form, the DOM view better reflects the XML structure. After parsing an XML document the DOM parser has built an in-memory representation of the document that will look something like Figure 16.2. The root of the DOM tree is obtained with the getDocumentElement() method. Element root = document.getDocumentElement();
Figure 16.2 -- Diagram of the DOM tree. This method returns an Element, which is simply a subclass of Node that may have attributes associated with it. An element can be the parent of other elements. There are a number of methods provided in the Document interface to access the nodes in the tree, some of which are listed in Table 16.8. These methods return either a Node or a NodeList (ordered collection of nodes). Table 16.8 Document Interface Methods to Traverse a DOM Tree
In a simple DOM application the getChildNodes() method can be used to recursively traverse the DOM tree. The NodeList.getLength() method can then be used to find out the number of nodes in the NodeList. NodeList children = node.getChildNodes(); int len = (children != null) ? children.getLength() : 0; In addition to the tree traversal methods, the Node interface provides the following methods (among others) to investigate the contents of a node as in Table 16.9. Table 16.9 Document Interface Methods to Inspect DOM Nodes
blog comments powered by Disqus |