Converted your little black book into XML, and don't know what to do next? This article gets you started on the path to being an XML guru, demonstrating how to use Perl's SAX parser to parse and convert your XML into Web-friendly HTML.
As I've just explained, the start(), end() and cdata() functions will be called by the parser as it progresses through the document. We haven't defined these yet - let's do that next:
# keep track of which tag is currently being processed
$currentTag = "";
# this is called when a start tag is found
sub start()
{
# extract variables
my ($parser, $name, %attr) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "<tr>";
}
elsif ($currentTag eq "title")
{
print "<td>";
}
elsif ($currentTag eq "author")
{
print "<td>";
}
elsif ($currentTag eq "price")
{
print "<td>";
}
elsif ($currentTag eq "rating")
{
print "<td>";
}
}
Each time the parser encounters a starting tag, it calls
start() with the name of the tag (and attributes, if any) as arguments. The start() function then processes the tag, printing corresponding HTML markup in place of the XML tag.
I've used an "if" statement, keyed on the tag name, to decide how to process each tag. For example, since I know that <book> indicates the beginning of a new row in my desired output, I replace it with a <tr>, while other elements like <title> and <author> correspond to table cells, and are replaced with <td> tags.
In case you're wondering, I've used the lc() function to convert the tag name to lowercase before performing the comparison; this is necessary to enforce consistency and to ensure that the script works with XML documents that use upper-case or mixed-case tags.
Finally, I've also stored the current tag name in the global variable $currentTag - this can be used to identify which tag is being processed at any stage, and it'll come in useful a little further down.
The end() function takes care of closing tags, and looks similar to start() - note that I've specifically cleaned up $currentTag at the end.
# this is called when an end tag is found
sub end()
{
my ($parser, $name) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "</tr>";
}
elsif ($currentTag eq "title")
{
print "</td>";
}
elsif ($currentTag eq "author")
{
print "</td>";
}
elsif ($currentTag eq "price")
{
print "</td>";
}
elsif ($currentTag eq "rating")
{
print "</td>";
}
# clear value of current tag
$currentTag = "";
}
Note that empty elements generate both start and end
events.
So this takes care of replacing XML tags with corresponding HTML tags...but what about handling the data between them?
# this is called when CDATA is found
sub cdata()
{
my ($parser, $data) = @_;
my @ratings = ("Words fail me!", "Terrible", "Bad", "Indifferent", "Good",
"Excellent");
if ($currentTag eq "title")
{
print "<i>$data</i>";
}
elsif ($currentTag eq "author")
{
print $data;
}
elsif ($currentTag eq "price")
{
print "\$$data";
}
elsif ($currentTag eq "rating")
{
print $ratings[$data];
}
}
The cdata() function is called whenever the parser encounters
data between an XML tag pair. Note, however, that the function is only passed the data as argument; there is no way of telling which tags are around it. However, since the parser processes XML chunk-by-chunk, we can use the $currentTag variable to identify which tag this data belongs to.
Depending on the value of $currentTag, an "if" statement is used to print data with appropriate formatting; this is the place where I add italics to the title, a currency symbol to the price, and a text rating (corresponding to a numerical index) from the @ratings array.
Here's what the finished script (with some additional HTML, so that you can use it via CGI) looks like:
#!/usr/bin/perl
# include package
use XML::Parser;
# initialize parser
$xp = new XML::Parser();
# set callback functions
$xp->setHandlers(Start => \&start, End => \&end, Char => \&cdata);
# keep track of which tag is currently being processed
$currentTag = "";
# send standard header to browser
print "Content-Type: text/html\n\n";
# set up HTML page
print "<html><head></head><body>";
print "<h2>The Library</h2>";
print "<table border=1 cellspacing=1 cellpadding=5>";
print "<tr><td align=center>Title</td><td align=center>Author</td><td
align=center>Price</td><td align=center>User Rating</td></tr>";
# parse XML
$xp->parsefile("library.xml");
print "</table></body></html>";
# this is called when a start tag is found
sub start()
{
# extract variables
my ($parser, $name, %attr) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "<tr>";
}
elsif ($currentTag eq "title")
{
print "<td>";
}
elsif ($currentTag eq "author")
{
print "<td>";
}
elsif ($currentTag eq "price")
{
print "<td>";
}
elsif ($currentTag eq "rating")
{
print "<td>";
}
}
# this is called when CDATA is found
sub cdata()
{
my ($parser, $data) = @_;
my @ratings = ("Words fail me!", "Terrible", "Bad", "Indifferent", "Good",
"Excellent");
if ($currentTag eq "title")
{
print "<i>$data</i>";
}
elsif ($currentTag eq "author")
{
print $data;
}
elsif ($currentTag eq "price")
{
print "\$$data";
}
elsif ($currentTag eq "rating")
{
print $ratings[$data];
}
}
# this is called when an end tag is found
sub end()
{
my ($parser, $name) = @_;
$currentTag = lc($name);
if ($currentTag eq "book")
{
print "</tr>";
}
elsif ($currentTag eq "title")
{
print "</td>";
}
elsif ($currentTag eq "author")
{
print "</td>";
}
elsif ($currentTag eq "price")
{
print "</td>";
}
elsif ($currentTag eq "rating")
{
print "</td>";
}
# clear value of current tag
$currentTag = "";
}
# end
And when you run it, here's what you'll see:
You can now add new items to your XML document, or edit
existing items, and your rendered HTML page will change accordingly. By separating the data from the presentation, XML has imposed standards on data collections, making it possible, for example, for users with no technical knowledge of HTML to easily update content on a Web site, or to present data from a single source in different ways.