Using Perl with XML (part 1) - What's For Dinner? (
Page 7 of 7 )
Here's another, slightly more
complex example using the SAX parser, and one of my favourite meals.
<?xml version="1.0"?>
<recipe>
<name>Chicken Tikka</name>
<author>Anonymous</author>
<date>1 June 1999</date>
<ingredients>
<item>
<desc>Boneless chicken breasts</desc>
<quantity>2</quantity>
</item>
<item>
<desc>Chopped onions</desc>
<quantity>2</quantity>
</item>
<item>
<desc>Ginger</desc>
<quantity>1 tsp</quantity>
</item>
<item>
<desc>Garlic</desc>
<quantity>1 tsp</quantity>
</item>
<item>
<desc>Red chili powder</desc>
<quantity>1 tsp</quantity>
</item>
<item>
<desc>Coriander seeds</desc>
<quantity>1 tsp</quantity>
</item>
<item>
<desc>Lime juice</desc>
<quantity>2 tbsp</quantity>
</item>
<item>
<desc>Butter</desc>
<quantity>1 tbsp</quantity>
</item>
</ingredients>
<servings>
3
</servings>
<process>
<step>Cut chicken into cubes, wash and apply lime juice and salt</step>
<step>Add ginger, garlic, chili, coriander and lime juice in a separate
bowl</step>
<step>Mix well, and add chicken to marinate for 3-4 hours</step>
<step>Place chicken pieces on skewers and barbeque</step>
<step>Remove, apply butter, and barbeque again until meat is tender</step>
<step>Garnish with lemon and chopped onions</step>
</process>
</recipe>
This time, my Perl script won't be using an "if" statement
when I parse the file above; instead, I'm going to be keying tag names to values
in a hash. Each of the tags in the XML file above will be replaced with
appropriate HTML markup.
#!/usr/bin/perl
# hash of tag names mapped to HTML markup
# "recipe" => start a new block
# "name" => in bold
# "ingredients" => unordered list
# "desc" => list items
# "process" => ordered list
# "step" => list items
%startTags = (
"recipe" => "<hr>",
"name" => "<font size=+2>",
"date" => "<i>(",
"author" => "<b>",
"servings" => "<i>Serves ",
"ingredients" => "<h3>Ingredients:</h3><ul>",
"desc" => "<li>",
"quantity" => "(",
"process" => "<h3>Preparation:</h3><ol>",
"step" => "<li>"
);
# close tags opened above
%endTags = (
"name" => "</font><br>",
"date" => ")</i>",
"author" => "</b>",
"ingredients" => "</ul>",
"quantity" => ")",
"servings" => "</i>",
"process" => "</ol>"
);
# name of XML file
$file = "recipe.xml";
# this is called when a start tag is found
sub start()
{
# extract variables
my ($parser, $name, %attr) = @_;
# lowercase element name
$name = lc($name);
# print corresponding HTML
if ($startTags{$name})
{
print $startTags{$name};
}
}
# this is called when CDATA is found
sub cdata()
{
my ($parser, $data) = @_;
print $data;
}
# this is called when an end tag is found
sub end()
{
my ($parser, $name) = @_;
$name = lc($name);
if ($endTags{$name})
{
print $endTags{$name};
}
}
# include package
use XML::Parser;
# initialize parser
$xp = new XML::Parser();
# set callback functions
$xp->setHandlers(Start => \&start, End => \&end, Char => \&cdata);
# send standard header to browser
print "Content-Type: text/html\n\n";
# print HTML header
print "<html><head></head><body>";
# parse XML
$xp->parsefile($file);
# print HTML footer
print "</body></html>";
# end
In this case, I've set up two hashes, one for opening tags
and one for closing tags. When the parser encounters an XML tag, it looks up the
hash to see if the tag exists as a key. If it does, the corresponding value
(HTML markup) is printed. This method does away with the slightly cumbersome
branching "if" statements of the previous example, and is easier to read and
understand.
Here's the output:

That's about it for the moment. Over the last few
pages, I've discussed using Perl's XML::Parser package to process an XML file
and mark up the data within it with HTML tags. However, just as there's more
than one way to skin a cat, there's more than one way to process XML data with
Perl. In the second part of this article, I'll be looking at an alternative
technique of parsing an XML file, this time using the DOM. Make sure you come
back for that one!
Note: All examples in this article have been tested on
Linux/i586 with Perl 5.005. Examples are illustrative only, and are not meant
for a production environment. YMMV!