SunQuest
 
       XML
  Home arrow XML arrow Page 5 - XML Parsing With DOM and Xerces (part ...
Dev Shed Forums 
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Sun Developer Network 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Actuate Whitepapers 
VeriSign Whitepapers 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
XML

XML Parsing With DOM and Xerces (part 1)
By: icarus, (c) Melonfire
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 36
    2002-02-19

    Table of Contents:
  • XML Parsing With DOM and Xerces (part 1)
  • Float Like A Butterfly...
  • Nailguns, Going Cheap
  • Delving Deeper
  • When Laziness Is A Virtue

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT

    Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here

    XML Parsing With DOM and Xerces (part 1) - When Laziness Is A Virtue


    (Page 5 of 5 )

    In the example above, I've manually written code to handle each level of the tree for illustrative purposes - however, in a production environment, doing this is pure insanity, especially since an XML document can have any number of nested levels. A far more professional approach would be to write a recursive function to automatically iterate through the document tree - this results in cleaner, more readable code, and it's also much, much easier to maintain.

    In order to understand the difference, consider this next example, which uses a recursive function to parse a more complex XML document. Here's the XML,

    <?xml version="1.0"?> <inventory> <!-- time to lock and load --> <item> <id>758</id> <name>Rusty, jagged nails for nailgun</name> <supplier>NailBarn, Inc.</supplier> <cost currency="USD">2.99</cost> <quantity alert="500">10000</quantity> </item> <item> <id>6273</id> <name>Power pack for death ray</name> <supplier>QuakePower.domain.com</supplier> <cost currency="USD">9.99</cost> <quantity alert="20">10</quantity> </item> </inventory>
    and here's the Java code to parse it:

    import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.*; import java.io.*; public class MyThirdDomApp { // a counter for keeping track of the "tabs" private int TabCounter = 0; // constructor public MyThirdDomApp (String xmlFile) { // create a Xerces DOM parser DOMParser parser = new DOMParser(); // parse the document and // access the root node with its children try { parser.parse(xmlFile); Document document = parser.getDocument(); NodeDetails(document); } catch (IOException e) { System.err.println (e); } } // this is a recursive function to traverse the document tree private void NodeDetails (Node node) { String Content = ""; int type = node.getNodeType(); // check if element if (type == Node.ELEMENT_NODE) { FormatTree(TabCounter); System.out.println ("Element: " + node.getNodeName() ); // check if the element has any attributes if(node.hasAttributes()) { // if it does, store it in a NamedNodeMap object NamedNodeMap AttributesList = node.getAttributes(); // iterate through the NamedNodeMap and get the attribute names and values for(int j = 0; j < AttributesList.getLength(); j++) { FormatTree(TabCounter); System.out.println("Attribute: " + AttributesList.item(j).getNodeName() + " = " + AttributesList.item(j).getNodeValue()); } } } else if (type == Node.TEXT_NODE) { // check if text node and print value Content = node.getNodeValue(); if (!Content.trim().equals("")){ FormatTree(TabCounter); System.out.println ("Character data: " + Content); } } else if (type == Node.COMMENT_NODE) { // check if comment node and print value Content = node.getNodeValue(); if (!Content.trim().equals("")){ FormatTree(TabCounter); System.out.println ("Comment: " + Content); } } // check if current node has any children NodeList children = node.getChildNodes(); if (children != null) { // if it does, iterate through the collection for (int i=0; i< children.getLength(); i++) { TabCounter++; // recursively call function to proceed to next level NodeDetails(children.item(i)); TabCounter--; } } } // this formats the output for the generated tree private void FormatTree (int TabCounter) { for(int j = 1; j < TabCounter; j++) { System.out.print("t"); } } // the main method to create an instance of our DOM application public static void main (String[] args) { MyThirdDomApp MyThirdDomApp = new MyThirdDomApp (args[0]); } }
    Here's the output:

    Element: inventory Comment: time to lock and load Element: item Element: id Character data: 758 Element: name Character data: Rusty, jagged nails for nailgun Element: supplier Character data: NailBarn, Inc. Element: cost Attribute: currency = USD Character data: 2.99 Element: quantity Attribute: alert = 500 Character data: 10000 Element: item Element: id Character data: 6273 Element: name Character data: Power pack for death ray Element: supplier Character data: QuakePower.domain.com Element: cost Attribute: currency = USD Character data: 9.99 Element: quantity Attribute: alert = 20 Character data: 10
    Now, wasn't that easier than manually writing code for each level of the document tree?

    This should be easily understandable if you're familiar with the concept of recursion. Most of the work happens in the NodeDetails() function, which now includes additional code to iterate through the different levels of the document tree automatically, and to make intelligent decisions about what to do with each node type found.

    // this is a recursive function to traverse the document tree private void NodeDetails (Node node) { // snip // check if element if (type == Node.ELEMENT_NODE) { FormatTree(TabCounter); System.out.println ("Element: " + node.getNodeName() ); // check if the element has any attributes if(node.hasAttributes()) { // if it does, store it in a NamedNodeMap object NamedNodeMap AttributesList = node.getAttributes(); // iterate through the NamedNodeMap and get the attribute names and values for(int j = 0; j < AttributesList.getLength(); j++) { FormatTree(TabCounter); System.out.println("Attribute: " + AttributesList.item(j).getNodeName() + " = " + AttributesList.item(j).getNodeValue()); } } } // snip } // snip }
    If the node is an element, the element name is printed to the standard output device. A check is then performed for element attributes; if they exist, they are returned as a NamedNodeMap object (essentially, an array whose elements can be accessed either by integer or string) and can be processed and displayed using methods exposed by that object. If you know the name of the attribute, the getNamedItem() method can be used to retrieve the corresponding value; if you don't (as in the example above), the getLength() and item() methods can be used in combination with a loop to iterate through the list of attributes.

    // this is a recursive function to traverse the document tree private void NodeDetails (Node node) { // snip // check if element if (type == Node.ELEMENT_NODE) { // snip } else if (type == Node.TEXT_NODE) { // check if text node and print value Content = node.getNodeValue(); if (!Content.trim().equals("")){ FormatTree(TabCounter); System.out.println ("Character data: " + Content); } } else if (type == Node.COMMENT_NODE) { // check if comment node and print value Content = node.getNodeValue(); if (!Content.trim().equals("")){ FormatTree(TabCounter); System.out.println ("Comment: " + Content); } } // snip }
    In a similar manner, it's also possible to check for text nodes, comments and any other node type, and write code to process each type individually. The example above handles text nodes and comments, printing each one to the standard output device as they are encountered. Note that, again, the getNodeValue() function is used to extract the raw value of the node - it must be nice to be so popular!

    Finally, once the node has been processed, it's time to see if it has any children, and proceed to the next level of the tree if so.

    // this is a recursive function to traverse the document tree private void NodeDetails (Node node) { // snip // check if current node has any children NodeList children = node.getChildNodes(); if (children != null) { // if it does, iterate through the collection for (int i=0; i< children.getLength(); i++) { TabCounter++; // recursively call function to proceed to next level NodeDetails(children.item(i)); TabCounter--; } } }
    In the event that the node does have children, the children are stored in a NodeList, and the NodeDetails() function is recursively called for each of these nodes. And so on, and so on, ad infinitum...or at least until the entire tree has been processed.

    Finally, the very simple FormatTree() method checks the value of the tab counter to determine the current depth within the XML tree, and displays that many spaces in the output in a primitive attempt to represent the data as a tree.

    // this formats the output for the generated tree private void FormatTree (int TabCounter) { for(int j = 1; j < TabCounter; j++) { System.out.print("t"); } }
    As with most things - easy when you know how.

    Obviously, this is just one illustration of the applications of the Xerces DOM parser. This is probably enough to get you started with simple Java/XML applications...but you can do a lot more with Xerces than just this.

    In the second part of this article, I'll build on everything you just learnt to demonstrate how the Xerces DOM parser can be combined with JSP to format XML documents for a Web browser. I'll also take a look at the error-handling functions built into the parser, demonstrating how they can be used to trap and catch errors in XML processing. Make sure you come back for that one!

    Note: All examples in this article have been tested with JDK 1.3.0, Apache 1.3.11, mod_jk 1.1.0, Xerces 1.4.4 and Tomcat 3.3. Examples are illustrative only, and are not meant for a production environment. YMMV!
    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

     

       

    XML ARTICLES

    - How to Set Up Podcasting and Vodcasting
    - Creating an RSS Reader Application
    - Building an RSS File
    - An Introduction to XUL Part 6
    - An Introduction to XUL Part 5
    - An Introduction to XUL Part 4
    - An Introduction to XUL Part 3
    - An Introduction to XUL Part 2
    - An Introduction to XUL Part 1
    - XML Matters: Practical XML Data Design and M...
    - Practical XML Data Design and Manipulation f...
    - SimpleXML
    - XForms Basics, Part 3
    - XForms Basics, Part 2
    - XForms Basics





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway