Perl Programming Page 2 - Using Perl With XML (part 2) |
Perl comes with a DOM parser based on the expat library created by James Clark; it's implemented as a Perl package named XML::DOM, and currently maintained by T. J. Mather. If you don't already have it, you should download and install it before proceeding further; you can get a copy from CPAN (http://www.cpan.org/). This DOM parser works by reading an XML document and creating objects to represent the different parts of that document. Each of these objects comes with specific methods and properties, which can be used to manipulate and access information about it. Thus, the entire XML document is represented as a "tree" of these objects, with the DOM parser providing a simple API to move between the different branches of the tree. The parser itself supports all the different structures typically found in an XML document - elements, attributes, namespaces, entities, notations et al - but our focus here will be primarily on elements and the data contained within them. If you're interested in the more arcane aspects of XML - as you will have to be to do anything complicated with the language - the XML::DOM package comes with some truly excellent documentation, which gets installed when you install the package. Make it your friend, and you'll find things considerably easier. Let's start things off with a simple example: In this case, a new instance of the parser is created and assigned to the variable $xp. This object instance can now be used to parse the XML data via its parse() function: You'll remember the parse() function from the first part of this article - it was used by the SAX parser to parse a string. When you think about it, this isn't really all that remarkable - the XML::DOM package is built on top of the XML::Parser package, and therefore inherits many of the latter's methods. With that in mind, it follows that the DOM parser should also be able to read an XML file directly, simply by using the parsefile() method, instead of the parse() method: The results of successfully parsing an XML document - whether string or file - is an object representation of the XML document (actually, an instance of the Document class). In the example above, this object is called $doc. This Document object comes with a bunch of interesting methods - and one of the more useful ones is the toString() method, which returns the current document tree as a string. In the examples above, I've used this method to print the entire document to the console. It should be noted that this isn't all that great an example of how to use the toString() method. Most often, this method is used during dynamic XML tree generation, when an XML tree is constructed in memory from a database or elsewhere. In such situations, the toString() method comes in handy to write the final XML tree to a file or send it to a parser for further processing.
blog comments powered by Disqus |