Using Perl with XML (part 1) - Call Me Back (
Page 5 of 7 )
As I've just explained, the start(),
end() and cdata() functions will be called by the parser as it progresses
through the document. We haven't defined these yet - let's do that next:
# keep track of which tag is currently being processed
$currentTag = "";
# this is called when a start tag is found
sub start()
{
# extract variables
my ($parser, $name, %attr) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "<tr>";
}
elsif ($currentTag eq "title")
{
print "<td>";
}
elsif ($currentTag eq "author")
{
print "<td>";
}
elsif ($currentTag eq "price")
{
print "<td>";
}
elsif ($currentTag eq "rating")
{
print "<td>";
}
}
Each time the parser encounters a starting tag, it calls
start() with the name of the tag (and attributes, if any) as arguments. The
start() function then processes the tag, printing corresponding HTML markup in
place of the XML tag.
I've used an "if" statement, keyed on the tag name,
to decide how to process each tag. For example, since I know that <book>
indicates the beginning of a new row in my desired output, I replace it with a
<tr>, while other elements like <title> and <author>
correspond to table cells, and are replaced with <td> tags.
In case
you're wondering, I've used the lc() function to convert the tag name to
lowercase before performing the comparison; this is necessary to enforce
consistency and to ensure that the script works with XML documents that use
upper-case or mixed-case tags.
Finally, I've also stored the current tag
name in the global variable $currentTag - this can be used to identify which tag
is being processed at any stage, and it'll come in useful a little further
down.
The end() function takes care of closing tags, and looks similar to
start() - note that I've specifically cleaned up $currentTag at the end.
# this is called when an end tag is found
sub end()
{
my ($parser, $name) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "</tr>";
}
elsif ($currentTag eq "title")
{
print "</td>";
}
elsif ($currentTag eq "author")
{
print "</td>";
}
elsif ($currentTag eq "price")
{
print "</td>";
}
elsif ($currentTag eq "rating")
{
print "</td>";
}
# clear value of current tag
$currentTag = "";
}
Note that empty elements generate both start and end
events.
So this takes care of replacing XML tags with corresponding HTML
tags...but what about handling the data between them?
# this is called when CDATA is found
sub cdata()
{
my ($parser, $data) = @_;
my @ratings = ("Words fail me!", "Terrible", "Bad", "Indifferent", "Good",
"Excellent");
if ($currentTag eq "title")
{
print "<i>$data</i>";
}
elsif ($currentTag eq "author")
{
print $data;
}
elsif ($currentTag eq "price")
{
print "\$$data";
}
elsif ($currentTag eq "rating")
{
print $ratings[$data];
}
}
The cdata() function is called whenever the parser encounters
data between an XML tag pair. Note, however, that the function is only passed
the data as argument; there is no way of telling which tags are around it.
However, since the parser processes XML chunk-by-chunk, we can use the
$currentTag variable to identify which tag this data belongs
to.
Depending on the value of $currentTag, an "if" statement is used to
print data with appropriate formatting; this is the place where I add italics to
the title, a currency symbol to the price, and a text rating (corresponding to a
numerical index) from the @ratings array.
Here's what the finished script
(with some additional HTML, so that you can use it via CGI) looks like:
#!/usr/bin/perl
# include package
use XML::Parser;
# initialize parser
$xp = new XML::Parser();
# set callback functions
$xp->setHandlers(Start => \&start, End => \&end, Char => \&cdata);
# keep track of which tag is currently being processed
$currentTag = "";
# send standard header to browser
print "Content-Type: text/html\n\n";
# set up HTML page
print "<html><head></head><body>";
print "<h2>The Library</h2>";
print "<table border=1 cellspacing=1 cellpadding=5>";
print "<tr><td align=center>Title</td><td align=center>Author</td><td
align=center>Price</td><td align=center>User Rating</td></tr>";
# parse XML
$xp->parsefile("library.xml");
print "</table></body></html>";
# this is called when a start tag is found
sub start()
{
# extract variables
my ($parser, $name, %attr) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "<tr>";
}
elsif ($currentTag eq "title")
{
print "<td>";
}
elsif ($currentTag eq "author")
{
print "<td>";
}
elsif ($currentTag eq "price")
{
print "<td>";
}
elsif ($currentTag eq "rating")
{
print "<td>";
}
}
# this is called when CDATA is found
sub cdata()
{
my ($parser, $data) = @_;
my @ratings = ("Words fail me!", "Terrible", "Bad", "Indifferent", "Good",
"Excellent");
if ($currentTag eq "title")
{
print "<i>$data</i>";
}
elsif ($currentTag eq "author")
{
print $data;
}
elsif ($currentTag eq "price")
{
print "\$$data";
}
elsif ($currentTag eq "rating")
{
print $ratings[$data];
}
}
# this is called when an end tag is found
sub end()
{
my ($parser, $name) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "</tr>";
}
elsif ($currentTag eq "title")
{
print "</td>";
}
elsif ($currentTag eq "author")
{
print "</td>";
}
elsif ($currentTag eq "price")
{
print "</td>";
}
elsif ($currentTag eq "rating")
{
print "</td>";
}
# clear value of current tag
$currentTag = "";
}
# end
And when you run it, here's what you'll see:

You can now add new items to your XML document, or edit
existing items, and your rendered HTML page will change accordingly. By
separating the data from the presentation, XML has imposed standards on data
collections, making it possible, for example, for users with no technical
knowledge of HTML to easily update content on a Web site, or to present data
from a single source in different ways.