Home arrow Perl Programming arrow Page 5 - Using Perl with XML (part 1)

Call Me Back - Perl

Converted your little black book into XML, and don't know what to do next? This article gets you started on the path to being an XML guru, demonstrating how to use Perl's SAX parser to parse and convert your XML into Web-friendly HTML.

TABLE OF CONTENTS:
  1. Using Perl with XML (part 1)
  2. Getting Down To Business
  3. Let's Talk About SAX
  4. Breaking It Down
  5. Call Me Back
  6. Random Walk
  7. What's For Dinner?
By: icarus, (c) Melonfire
Rating: starstarstarstarstar / 20
January 15, 2002

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement
As I've just explained, the start(), end() and cdata() functions will be called by the parser as it progresses through the document. We haven't defined these yet - let's do that next:

# keep track of which tag is currently being processed $currentTag = ""; # this is called when a start tag is found sub start() { # extract variables my ($parser, $name, %attr) = @_; $currentTag = lc($name); if ($currentTag eq "book") { print "<tr>"; } elsif ($currentTag eq "title") { print "<td>"; } elsif ($currentTag eq "author") { print "<td>"; } elsif ($currentTag eq "price") { print "<td>"; } elsif ($currentTag eq "rating") { print "<td>"; } }
Each time the parser encounters a starting tag, it calls start() with the name of the tag (and attributes, if any) as arguments. The start() function then processes the tag, printing corresponding HTML markup in place of the XML tag.

I've used an "if" statement, keyed on the tag name, to decide how to process each tag. For example, since I know that <book> indicates the beginning of a new row in my desired output, I replace it with a <tr>, while other elements like <title> and <author> correspond to table cells, and are replaced with <td> tags.

In case you're wondering, I've used the lc() function to convert the tag name to lowercase before performing the comparison; this is necessary to enforce consistency and to ensure that the script works with XML documents that use upper-case or mixed-case tags.

Finally, I've also stored the current tag name in the global variable $currentTag - this can be used to identify which tag is being processed at any stage, and it'll come in useful a little further down.

The end() function takes care of closing tags, and looks similar to start() - note that I've specifically cleaned up $currentTag at the end.

# this is called when an end tag is found sub end() { my ($parser, $name) = @_; $currentTag = lc($name); if ($currentTag eq "book") { print "</tr>"; } elsif ($currentTag eq "title") { print "</td>"; } elsif ($currentTag eq "author") { print "</td>"; } elsif ($currentTag eq "price") { print "</td>"; } elsif ($currentTag eq "rating") { print "</td>"; } # clear value of current tag $currentTag = ""; }
Note that empty elements generate both start and end events.

So this takes care of replacing XML tags with corresponding HTML tags...but what about handling the data between them?

# this is called when CDATA is found sub cdata() { my ($parser, $data) = @_; my @ratings = ("Words fail me!", "Terrible", "Bad", "Indifferent", "Good", "Excellent"); if ($currentTag eq "title") { print "<i>$data</i>"; } elsif ($currentTag eq "author") { print $data; } elsif ($currentTag eq "price") { print "\$$data"; } elsif ($currentTag eq "rating") { print $ratings[$data]; } }
The cdata() function is called whenever the parser encounters data between an XML tag pair. Note, however, that the function is only passed the data as argument; there is no way of telling which tags are around it. However, since the parser processes XML chunk-by-chunk, we can use the $currentTag variable to identify which tag this data belongs to.

Depending on the value of $currentTag, an "if" statement is used to print data with appropriate formatting; this is the place where I add italics to the title, a currency symbol to the price, and a text rating (corresponding to a numerical index) from the @ratings array.

Here's what the finished script (with some additional HTML, so that you can use it via CGI) looks like:

#!/usr/bin/perl # include package use XML::Parser; # initialize parser $xp = new XML::Parser(); # set callback functions $xp->setHandlers(Start => \&start, End => \&end, Char => \&cdata); # keep track of which tag is currently being processed $currentTag = ""; # send standard header to browser print "Content-Type: text/html\n\n"; # set up HTML page print "<html><head></head><body>"; print "<h2>The Library</h2>"; print "<table border=1 cellspacing=1 cellpadding=5>"; print "<tr><td align=center>Title</td><td align=center>Author</td><td align=center>Price</td><td align=center>User Rating</td></tr>"; # parse XML $xp->parsefile("library.xml"); print "</table></body></html>"; # this is called when a start tag is found sub start() { # extract variables my ($parser, $name, %attr) = @_; $currentTag = lc($name); if ($currentTag eq "book") { print "<tr>"; } elsif ($currentTag eq "title") { print "<td>"; } elsif ($currentTag eq "author") { print "<td>"; } elsif ($currentTag eq "price") { print "<td>"; } elsif ($currentTag eq "rating") { print "<td>"; } } # this is called when CDATA is found sub cdata() { my ($parser, $data) = @_; my @ratings = ("Words fail me!", "Terrible", "Bad", "Indifferent", "Good", "Excellent"); if ($currentTag eq "title") { print "<i>$data</i>"; } elsif ($currentTag eq "author") { print $data; } elsif ($currentTag eq "price") { print "\$$data"; } elsif ($currentTag eq "rating") { print $ratings[$data]; } } # this is called when an end tag is found sub end() { my ($parser, $name) = @_; $currentTag = lc($name); if ($currentTag eq "book") { print "</tr>"; } elsif ($currentTag eq "title") { print "</td>"; } elsif ($currentTag eq "author") { print "</td>"; } elsif ($currentTag eq "price") { print "</td>"; } elsif ($currentTag eq "rating") { print "</td>"; } # clear value of current tag $currentTag = ""; } # end
And when you run it, here's what you'll see:



You can now add new items to your XML document, or edit existing items, and your rendered HTML page will change accordingly. By separating the data from the presentation, XML has imposed standards on data collections, making it possible, for example, for users with no technical knowledge of HTML to easily update content on a Web site, or to present data from a single source in different ways.

 
 
>>> More Perl Programming Articles          >>> More By icarus, (c) Melonfire
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: