Build a Perl RSS Aggregator with Templating Tools

In this third part of a five-part series on templating tools, you’ll learn how to write a simple RSS aggregator, and more. It is excerpted from chapter three of the book Advanced Perl Programming, Second Edition, written by Simon Cozens (O’Reilly; ISBN: 0596004567). Copyright © 2007 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

Loops

If we’re going to get anywhere with our RSS example, we’ll need to loop over a series of items–the stories in our newsreel. Thankfully, HTML::Template provides the <TMPL_LOOP> pseudotag for treating a variable as an array. For instance, the following code:

  <ul>
  <TMPL_LOOP NAME=STORIES>
     
<li> From <TMPL_VAR NAME=FEED_NAME>: <TMPL_VAR NAME=STORY_NAME> </li>
  </TMPL_LOOP>
  </ul>

when provided the appropriate data structure, loops over the items in the STORIES array reference and produces output like so:

  <ul>

      <li>
From Slashdot: NASA Finds Monkeys on Mars </li>
 
      <li>
From use.perl: Perl 6 Release Predicted for 2013 </li>

  </ul>

The trick is that the array reference needs to contain an array of hashes, and each hash provides the appropriate variable names:

  $template->param(STORIES => [
  
{ FEED_NAME => "Slashdot", STORY_NAME => "NASA Finds Monkeys on Mars" },
  
{ FEED_NAME => "use.perl", STORY_NAME => "Perl 6 Release Predicted for 2013" }
  ]);

{mospagebreak title=RSS Aggregation}

With this knowledge, putting together our RSS aggregator is pretty trivial; first, we grab all the feeds we’re interested in, then sort out their stories and put them into a data structure suitable for feeding to a <TMPL_LOOP>.

We’ll use LWP and XML::RSS to obtain and parse the RSS feeds. In our example, we’re going to pretend that we’re behind a pretty impressive web cache, so we have no problems fetching the RSS feeds repeatedly; in real life, you may want to save the XML to files with fixed names and check how old the files on disk are before fetching them from the web again.

We’ll start our RSS aggregator by writing a little Perl program to grab and organize the feeds:

  #!/usr/bin/perl

  use LWP::Simple;
  use XML::RSS;
  my @stories;
 
while (<DATA>) {
      chomp;
      my $xml = get($_) or next;
      my $rss = XML::RSS->new;
      eval { $rss->parse($xml) }; next if $@;
      for my $item (@{$rss->{‘items’}}) {
          push @stories, {
               FEED_NAME  => $rss->channel->{‘title’},
               FEED_URL   => $rss->channel->{‘link’},

               STORY_NAME => $item->{‘title’},
               STORY_URL  => $item->{‘link’},
               STORY_DESC => $item->{‘description’},
               STORY_DATE => $item->{‘dc’}->{‘date’}
          
}
      }
  }

  @stories = sort { $b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories;

  __DATA__
  http://slashdot.org/slashdot.rss
  http://use.perl.org/perl-news-short.rdf
  http://www.theregister.co.uk/tonys/
slashdot.rdf
  http://blog.simon-cozens.org/blosxom.cgi/xml
  http://www.oreillynet.com/~rael/index.rss

Next we need to design a template to receive this list of feeds. Now, I’m an abysmal HTML designer, which is why I like templates so much. I can create something rough that does the job and hand it to someone with imagination to do the presentation bits. So here’s a rough-and-ready template:

  <html>
    <head> <title> Today’s News </title> </head>
    <body>
      <h1> News Stories Collected at <TMPL_VAR TIME> </h1>

      <TMPL_LOOP STORIES>
        <table border="1">
          <tr>
           <td>
            <h2>
             <a href="<TMPL_VAR STORY_URL>"> <TMPL_VAR STORY_NAME> </a>
            </h2>
           
<p> <TMPL_VAR STORY_DESC> </p>
            <hr>
            <p> <i> From
                <a href="<TMPL_VAR FEED_URL>"> <TMPL_VAR FEED_NAME> </a>
            </i> </p>
          </td>
        </tr>
       </table>
      </TMPL_LOOP>
    
</body>
  </html>

(Notice that we’re using short forms of the pseudotags: it’s OK to say SOME_VARIABLE instead of NAME=SOME_VARIABLE where it’s unambiguous.)

Finally, we put the finishing touches on our driver program, which merely takes the array we generated and feeds it to HTML::Template:

  #!/usr/bin/perl

  use LWP::Simple;
  use XML::RSS;
  use HTML::Template;

  my @stories;

  while (<DATA>) {
      chomp;
      my $xml = get($_) or next;
      my $rss = XML::RSS->new;
      eval { $rss->parse($xml) }; next if $@;
      for my $item (@{$rss->{‘items’}}) {
          push @stories, {
               FEED_NAME  => $rss->channel->{‘title’},
               FEED_URL   => $rss->channel->{‘link’},

               STORY_NAME => $item->{‘title’},
               STORY_URL  => $item->{‘link’},
               STORY_DESC => $item->{‘description’},
               STORY_DATE => $item->{‘dc’}->{‘date’}
         
}
      }
  }

  my $template = HTML::Template->new(filename => "aggregator.tmpl");

  $template->param( STORIES => [
      sort {$b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories
                      ] );
  $template->param( TIME => scalar localtime );

  delete $_->{STORY_DATE} for @stories;

  print "Content-Type: text/htmlnn", $template->output;

  __DATA__
  http://blog.simon-cozens.org/blosxom.cgi/xml
  http://slashdot.org/slashdot.rss
  http://use.perl.org/perl-news-short.rdf 
  http://www.theregister.co.uk/tonys/ slashdot.rdf
  http://www.oreillynet.com/~rael/index.rss

We need to delete the STORY_DATE once we’ve used it for ordering, as HTML::Template gets irate if we have loop variables that we don’t use in our template.

Plug this into a CGI-enabled web server, and, lo and behold, we have a cheap and cheerful Amphetadesk clone.

{mospagebreak title=HTML::Mason}

One of the big drawbacks of HTML::Template is that it forces us, to some degree, to mix program logic and presentation, something that we sought to avoid by using templates. For instance, that last template got a little difficult to follow, with variable and HTML tags crowding up the template and obscuring what was actually going on. What we would prefer, then, is a system that allows us to further abstract out the individual elements of what we expect our templates to do, and this is where HTML::Mason comes in.

As we’ve mentioned, HTML::Mason is an inside-out templating system. As well as templating, it could also be described as a component abstraction system for building HTML web pages out of smaller, reusable pieces of logic. Here’s a brief overview of how to use it, before we go on to implement the same RSS aggregator application.

Basic Components

In Mason, everything is a component. Here’s a simple example of using components. Suppose we have three files: test.html in Example 3-1, Header in Example 3-2, and Footer in Example 3-3.

Example 3-1. test.html

<& /Header &>
<p>
 
Hello World
</p>
<& /Footer &>

Example 3-2. Header

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
 
<head>
     
<title>Some Web Application</title>
     
<link rel=stylesheet type="text/css" href="nt.css">
 
</head>

<body>

Example 3-3. Footer

  <hr>
  <div class="footer">
    <address>
       <a href="mailto:webmaster@yourcompany.com"> webmaster@yourcompany.com</a>
    </address>
  </div>
 </body>
</html>

HTML::Mason builds up the page by including the components specified inside <& and &> tags. When creating test.html, Mason first includes the Headercomponent found at the document root, then the rest of the HTML, then the Footer component.

Components may call other components. So far, we’ve done nothing outside the scope of server-side includes.

{mospagebreak title=Basic Dynamism}

So where does the templating come in? There are three basic ways of adding templates to Mason pages. Here’s the first, a simple modification to our Footer component.

   <hr>
   <div class="footer">
     <address>
        <a href="mailto:webmaster@yourcompany.com"> webmaster@yourcompany.com</a>
     </address>
     Generated: <% scalar localtime %>
  
</div>
  </body>
 </html>

If you wrap some Perl code in <% … %> tags, the result of the Perl expression is inserted into the resulting HTML.

That’s all very well for simple expressions, but what about actual Perl logic? For this, Mason has an ugly hack: a single % at the beginning of a line is interpreted as Perl code. This lets you do things like Example 3-4, to dump out the contents of a hash.

Example 3-4. Hashdump

<table>
 
<tr>
    <th> key </th>
    <th>value</th>
 
</tr>

% for (keys %hash) {
 
<tr>
    <td> <% $_ %> </td>
    <td> <% $hash{$_} %> </td>
  </tr>
% }
</table>
<%ARGS>
%hash => undef
</%ARGS>

There’s a few things to notice in this example. First, see how we intersperse ordinary HTML with logic, using % … , and evaluated Perl expressions, using
<% … %>. The only places % is special are at the start of a line and as part of the <% … %> tag; the % of %hash is plain Perl.

The second thing to notice in the example is how we get the hash into the component in the first place. That’s the purpose of the <%ARGS> section–it declares arguments to pass to the component. And how do we pass in those arguments? Here’s something that might call Hashdump:

  % my %foo = ( one => 1, two => 2 );

  <& /Hashdump, hash => %foo &>

So altogether, we have an example of declaring my variables inside a component, passing a named parameter to another component, and having that component receive the parameter and make use of it. Mason will try to do something sensible if you pass parameters of different types than the types you’ve declared in the <%ARGS> section of the receiving component (here we passed a hash to fill in the %hash parameter, for instance), but life is easier if you stick to the same types.

Please check back next week for the continuation of this series.

[gp-comments width="770" linklove="off" ]

chat