They say there's more than one way to skin a cat - and that'stwice as true when you're a Perl developer. In this concluding article onXML parsing with Perl, find out how the XML::DOM package provides analternative technique for manipulating XML elements and attributes, andcompare the two approaches to see which one works best for you.
Perl comes with a DOM parser based on the expat library created by James Clark; it's implemented as a Perl package named XML::DOM, and currently maintained by T. J. Mather. If you don't already have it, you should download and install it before proceeding further; you can get a copy from CPAN (http://www.cpan.org/).
This DOM parser works by reading an XML document and creating objects to represent the different parts of that document. Each of these objects comes with specific methods and properties, which can be used to manipulate and access information about it. Thus, the entire XML document is represented as a "tree" of these objects, with the DOM parser providing a simple API to move between the different branches of the tree.
The parser itself supports all the different structures typically found in an XML document - elements, attributes, namespaces, entities, notations et al - but our focus here will be primarily on elements and the data contained within them. If you're interested in the more arcane aspects of XML - as you will have to be to do anything complicated with the language - the XML::DOM package comes with some truly excellent documentation, which gets installed when you install the package. Make it your friend, and you'll find things considerably easier.
Let's start things off with a simple example:
#!/usr/bin/perl
# create an XML-compliant string
$xml = "<?xml version=\"1.0\"?><me><name>Joe
Cool</name><age>24</age><sex>male</sex></me>";
# include package
use XML::DOM;
# instantiate parser
$xp = new XML::DOM::Parser();
# parse and create tree
$doc = $xp->parse($xml);
# print tree as string
print $doc->toString();
# end
In this case, a new instance of the parser is created and assigned to the variable $xp. This object instance can now be used to parse the XML data via its parse() function:
# instantiate parser
$xp = new XML::DOM::Parser();# parse and create tree$doc = $xp->parse($xml);
You'll remember the parse() function from the first part of this article - it was used by the SAX parser to parse a string. When you think about it, this isn't really all that remarkable - the XML::DOM package is built on top of the XML::Parser package, and therefore inherits many of the latter's methods.
With that in mind, it follows that the DOM parser should also be able to read an XML file directly, simply by using the parsefile() method, instead of the parse() method:
#!/usr/bin/perl
# XML file$file = "me.xml";# include packageuse XML::DOM;# instantiate parser$xp = new XML::DOM::Parser();# parse and create tree$doc = $xp->parsefile($file);# print tree as stringprint $doc->toString();# end
The results of successfully parsing an XML document - whether string or file - is an object representation of the XML document (actually, an instance of the Document class). In the example above, this object is called $doc.
# instantiate parser
$xp = new XML::DOM::Parser();# parse and create tree$doc = $xp->parsefile($file);
This Document object comes with a bunch of interesting methods - and one of the more useful ones is the toString() method, which returns the current document tree as a string. In the examples above, I've used this method to print the entire document to the console.
# print tree as string
print $doc->toString();
It should be noted that this isn't all that great an example of how to use the toString() method. Most often, this method is used during dynamic XML tree generation, when an XML tree is constructed in memory from a database or elsewhere. In such situations, the toString() method comes in handy to write the final XML tree to a file or send it to a parser for further processing.