Using Perl with XML (part 1) - Breaking It Down (
Page 4 of 7 )
The first order of business is to
initialize the XML parser, and set up the callback functions.
#!/usr/bin/perl
# include package
use XML::Parser;
# initialize parser
$xp = new XML::Parser();
# set callback functions
$xp->setHandlers(Start => \&start, End => \&end, Char => \&cdata);
# parse XML
$xp->parsefile("library.xml");
The parser is initialized in the ordinary way - by
instantiating a new object of the Parser class. This object is assigned to the
variable $xp, and is used in subsequent function calls.
# initialize parser
$xp = new XML::Parser();
The next step is to specify the functions to be executed when
the parser encounters the opening and closing tags of an element. The
setHandlers() method is used to specify these functions; it accepts a hash of
values, with keys containing the events to watch out for, and values indicating
which functions to trigger.
# set callback functions
$xp->setHandlers(Start => \&start, End => \&end, Char => \&cdata);
In this case, the user-defined functions start() and end()
are called when starting and ending element tags are encountered, while
character data triggers the cdata() function.
Obviously, these aren't the
only types of events a parser can be set up to handle - the XML::Parser package
allows you to specify handlers for a diverse array of events; I'll discuss these
briefly a little later.
The next step in the script above is to open the
XML file, read it and parse it via the parsefile() method. The parsefile()
method will iterate through the XML document, calling the appropriate handling
function each time it encounters a specific data type.
# parse XML
$xp->parsefile("library.xml");
In case your XML data is not stored in a file, but in a
string variable - quite likely if, for example, you've generated it dynamically
from a database - you can replace the parsefile() method with the parse()
method, which accepts a string variable containing the XML document, rather than
a filename.
Once the document has been completely parsed, the script will
proceed to the next line (if there is one), or terminate gracefully. A parse
error - for example, a mismatched tag or a badly-nested element - will cause the
script to die immediately.
As you can see, this is fairly simple -
simpler, in fact, than the equivalent process in other languages like PHP or
Java. Don't get worried, though - this simplicity conceals a fair amount of
power.