Perl
  Home arrow Perl arrow Page 2 - Build a Perl RSS Aggregator with Templating Tools
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
Google.com  
PERL

Build a Perl RSS Aggregator with Templating Tools
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 2
    2008-08-21


    Table of Contents:
  • Build a Perl RSS Aggregator with Templating Tools
  • RSS Aggregation
  • HTML::Mason
  • Basic Dynamism

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Build a Perl RSS Aggregator with Templating Tools - RSS Aggregation
    ( Page 2 of 4 )

    With this knowledge, putting together our RSS aggregator is pretty trivial; first, we grab all the feeds we're interested in, then sort out their stories and put them into a data structure suitable for feeding to a <TMPL_LOOP>.

    We'll use LWP and XML::RSS to obtain and parse the RSS feeds. In our example, we're going to pretend that we're behind a pretty impressive web cache, so we have no problems fetching the RSS feeds repeatedly; in real life, you may want to save the XML to files with fixed names and check how old the files on disk are before fetching them from the web again.

    We'll start our RSS aggregator by writing a little Perl program to grab and organize the feeds:

      #!/usr/bin/perl

      use LWP::Simple;
      use XML::RSS;
      my @stories;
     
    while (<DATA>) {
          chomp;
          my $xml = get($_) or next;
          my $rss = XML::RSS->new;
          eval { $rss->parse($xml) }; next if $@;
          for my $item (@{$rss->{'items'}}) {
              push @stories, {
                   FEED_NAME  => $rss->channel->{'title'},
                   FEED_URL   => $rss->channel->{'link'},

                   STORY_NAME => $item->{'title'},
                   STORY_URL  => $item->{'link'},
                   STORY_DESC => $item->{'description'},
                   STORY_DATE => $item->{'dc'}->{'date'}
              
    }
          }
      }

      @stories = sort { $b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories;

      __DATA__
      http://slashdot.org/slashdot.rss
      http://use.perl.org/perl-news-short.rdf
      http://www.theregister.co.uk/tonys/
    slashdot.rdf
      http://blog.simon-cozens.org/blosxom.cgi/xml
      http://www.oreillynet.com/~rael/index.rss

    Next we need to design a template to receive this list of feeds. Now, I'm an abysmal HTML designer, which is why I like templates so much. I can create something rough that does the job and hand it to someone with imagination to do the presentation bits. So here's a rough-and-ready template:

      <html>
        <head> <title> Today's News </title> </head>
        <body>
          <h1> News Stories Collected at <TMPL_VAR TIME> </h1>

          <TMPL_LOOP STORIES>
            <table border="1">
              <tr>
               <td>
                <h2>
                 <a href="<TMPL_VAR STORY_URL>"> <TMPL_VAR STORY_NAME> </a>
                </h2>
               
    <p> <TMPL_VAR STORY_DESC> </p>
                <hr>
                <p> <i> From
                    <a href="<TMPL_VAR FEED_URL>"> <TMPL_VAR FEED_NAME> </a>
                </i> </p>
              </td>
            </tr>
           </table>
          </TMPL_LOOP>
        
    </body>
      </html>

    (Notice that we're using short forms of the pseudotags: it's OK to say SOME_VARIABLE instead of NAME=SOME_VARIABLE where it's unambiguous.)

    Finally, we put the finishing touches on our driver program, which merely takes the array we generated and feeds it to HTML::Template:

      #!/usr/bin/perl

      use LWP::Simple;
      use XML::RSS;
      use HTML::Template;

      my @stories;

      while (<DATA>) {
          chomp;
          my $xml = get($_) or next;
          my $rss = XML::RSS->new;
          eval { $rss->parse($xml) }; next if $@;
          for my $item (@{$rss->{'items'}}) {
              push @stories, {
                   FEED_NAME  => $rss->channel->{'title'},
                   FEED_URL   => $rss->channel->{'link'},

                   STORY_NAME => $item->{'title'},
                   STORY_URL  => $item->{'link'},
                   STORY_DESC => $item->{'description'},
                   STORY_DATE => $item->{'dc'}->{'date'}
             
    }
          }
      }

      my $template = HTML::Template->new(filename => "aggregator.tmpl");

      $template->param( STORIES => [
          sort {$b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories
                          ] );
      $template->param( TIME => scalar localtime );

      delete $_->{STORY_DATE} for @stories;

      print "Content-Type: text/html\n\n", $template->output;

      __DATA__
      http://blog.simon-cozens.org/blosxom.cgi/xml
      http://slashdot.org/slashdot.rss
      http://use.perl.org/perl-news-short.rdf 
      http://www.theregister.co.uk/tonys/ slashdot.rdf
      http://www.oreillynet.com/~rael/index.rss

    We need to delete the STORY_DATE once we've used it for ordering, as HTML::Template gets irate if we have loop variables that we don't use in our template.

    Plug this into a CGI-enabled web server, and, lo and behold, we have a cheap and cheerful Amphetadesk clone.



     
     
    >>> More Perl Articles          >>> More By O'Reilly Media
     

       

    PERL ARTICLES

    - More Perl Bits
    - Perl, Bit by Bit
    - Basic Charting with Perl
    - Using Getopt::Long: More Command Line Option...
    - Command Line Options in Perl: Using Getopt::...
    - Web Access with LWP
    - More Templating Tools for Perl
    - Site Layout with Perl Templating Tools
    - Build a Perl RSS Aggregator with Templating ...
    - Looping, Security, and Templating Tools
    - Perl: Bon Voyage Lists and Hashes
    - Templating Tools
    - Perl: Number Crunching
    - Perl Debuggers in Detail
    - Debugging Perl





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek