Home arrow Perl Programming arrow Page 2 - Build a Perl RSS Aggregator with Templating Tools

RSS Aggregation - Perl

In this third part of a five-part series on templating tools, you'll learn how to write a simple RSS aggregator, and more. It is excerpted from chapter three of the book Advanced Perl Programming, Second Edition, written by Simon Cozens (O'Reilly; ISBN: 0596004567). Copyright 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

TABLE OF CONTENTS:
  1. Build a Perl RSS Aggregator with Templating Tools
  2. RSS Aggregation
  3. HTML::Mason
  4. Basic Dynamism
By: O'Reilly Media
Rating: starstarstarstarstar / 2
August 21, 2008

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

With this knowledge, putting together our RSS aggregator is pretty trivial; first, we grab all the feeds we're interested in, then sort out their stories and put them into a data structure suitable for feeding to a <TMPL_LOOP>.

We'll use LWP and XML::RSS to obtain and parse the RSS feeds. In our example, we're going to pretend that we're behind a pretty impressive web cache, so we have no problems fetching the RSS feeds repeatedly; in real life, you may want to save the XML to files with fixed names and check how old the files on disk are before fetching them from the web again.

We'll start our RSS aggregator by writing a little Perl program to grab and organize the feeds:

  #!/usr/bin/perl

  use LWP::Simple;
  use XML::RSS;
  my @stories;
 
while (<DATA>) {
      chomp;
      my $xml = get($_) or next;
      my $rss = XML::RSS->new;
      eval { $rss->parse($xml) }; next if $@;
      for my $item (@{$rss->{'items'}}) {
          push @stories, {
               FEED_NAME  => $rss->channel->{'title'},
               FEED_URL   => $rss->channel->{'link'},

               STORY_NAME => $item->{'title'},
               STORY_URL  => $item->{'link'},
               STORY_DESC => $item->{'description'},
               STORY_DATE => $item->{'dc'}->{'date'}
          
}
      }
  }

  @stories = sort { $b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories;

  __DATA__
  http://slashdot.org/slashdot.rss
  http://use.perl.org/perl-news-short.rdf
  http://www.theregister.co.uk/tonys/
slashdot.rdf
  http://blog.simon-cozens.org/blosxom.cgi/xml
  http://www.oreillynet.com/~rael/index.rss

Next we need to design a template to receive this list of feeds. Now, I'm an abysmal HTML designer, which is why I like templates so much. I can create something rough that does the job and hand it to someone with imagination to do the presentation bits. So here's a rough-and-ready template:

  <html>
    <head> <title> Today's News </title> </head>
    <body>
      <h1> News Stories Collected at <TMPL_VAR TIME> </h1>

      <TMPL_LOOP STORIES>
        <table border="1">
          <tr>
           <td>
            <h2>
             <a href="<TMPL_VAR STORY_URL>"> <TMPL_VAR STORY_NAME> </a>
            </h2>
           
<p> <TMPL_VAR STORY_DESC> </p>
            <hr>
            <p> <i> From
                <a href="<TMPL_VAR FEED_URL>"> <TMPL_VAR FEED_NAME> </a>
            </i> </p>
          </td>
        </tr>
       </table>
      </TMPL_LOOP>
    
</body>
  </html>

(Notice that we're using short forms of the pseudotags: it's OK to say SOME_VARIABLE instead of NAME=SOME_VARIABLE where it's unambiguous.)

Finally, we put the finishing touches on our driver program, which merely takes the array we generated and feeds it to HTML::Template:

  #!/usr/bin/perl

  use LWP::Simple;
  use XML::RSS;
  use HTML::Template;

  my @stories;

  while (<DATA>) {
      chomp;
      my $xml = get($_) or next;
      my $rss = XML::RSS->new;
      eval { $rss->parse($xml) }; next if $@;
      for my $item (@{$rss->{'items'}}) {
          push @stories, {
               FEED_NAME  => $rss->channel->{'title'},
               FEED_URL   => $rss->channel->{'link'},

               STORY_NAME => $item->{'title'},
               STORY_URL  => $item->{'link'},
               STORY_DESC => $item->{'description'},
               STORY_DATE => $item->{'dc'}->{'date'}
         
}
      }
  }

  my $template = HTML::Template->new(filename => "aggregator.tmpl");

  $template->param( STORIES => [
      sort {$b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories
                      ] );
  $template->param( TIME => scalar localtime );

  delete $_->{STORY_DATE} for @stories;

  print "Content-Type: text/html\n\n", $template->output;

  __DATA__
  http://blog.simon-cozens.org/blosxom.cgi/xml
  http://slashdot.org/slashdot.rss
  http://use.perl.org/perl-news-short.rdf 
  http://www.theregister.co.uk/tonys/ slashdot.rdf
  http://www.oreillynet.com/~rael/index.rss

We need to delete the STORY_DATE once we've used it for ordering, as HTML::Template gets irate if we have loop variables that we don't use in our template.

Plug this into a CGI-enabled web server, and, lo and behold, we have a cheap and cheerful Amphetadesk clone.



 
 
>>> More Perl Programming Articles          >>> More By O'Reilly Media
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: