Build a Perl RSS Aggregator with Templating Tools - RSS Aggregation (
Page 2 of 4 )
With this knowledge, putting together our RSS aggregator is pretty trivial; first, we grab all the feeds we're interested in, then sort out their stories and put them into a data structure suitable for feeding to a <TMPL_LOOP>.
We'll use LWP and XML::RSS to obtain and parse the RSS feeds. In our example, we're going to pretend that we're behind a pretty impressive web cache, so we have no problems fetching the RSS feeds repeatedly; in real life, you may want to save the XML to files with fixed names and check how old the files on disk are before fetching them from the web again.
We'll start our RSS aggregator by writing a little Perl program to grab and organize the feeds:
#!/usr/bin/perl
use LWP::Simple;
use XML::RSS;
my @stories;
while (<DATA>) {
chomp;
my $xml = get($_) or next;
my $rss = XML::RSS->new;
eval { $rss->parse($xml) }; next if $@;
for my $item (@{$rss->{'items'}}) {
push @stories, {
FEED_NAME => $rss->channel->{'title'},
FEED_URL => $rss->channel->{'link'},
STORY_NAME => $item->{'title'},
STORY_URL => $item->{'link'},
STORY_DESC => $item->{'description'},
STORY_DATE => $item->{'dc'}->{'date'}
}
}
}
@stories = sort { $b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories;
__DATA__
http://slashdot.org/slashdot.rss
http://use.perl.org/perl-news-short.rdf
http://www.theregister.co.uk/tonys/ slashdot.rdf
http://blog.simon-cozens.org/blosxom.cgi/xml
http://www.oreillynet.com/~rael/index.rss
Next we need to design a template to receive this list of feeds. Now, I'm an abysmal HTML designer, which is why I like templates so much. I can create something rough that does the job and hand it to someone with imagination to do the presentation bits. So here's a rough-and-ready template:
<html>
<head> <title> Today's News </title> </head>
<body>
<h1> News Stories Collected at <TMPL_VAR TIME> </h1>
<TMPL_LOOP STORIES>
<table border="1">
<tr>
<td>
<h2>
<a href="<TMPL_VAR STORY_URL>"> <TMPL_VAR STORY_NAME> </a>
</h2>
<p> <TMPL_VAR STORY_DESC> </p>
<hr>
<p> <i> From
<a href="<TMPL_VAR FEED_URL>"> <TMPL_VAR FEED_NAME> </a>
</i> </p>
</td>
</tr>
</table>
</TMPL_LOOP>
</body>
</html>
(Notice that we're using short forms of the pseudotags: it's OK to say SOME_VARIABLE instead of NAME=SOME_VARIABLE where it's unambiguous.)
Finally, we put the finishing touches on our driver program, which merely takes the array we generated and feeds it to HTML::Template:
#!/usr/bin/perl
use LWP::Simple;
use XML::RSS;
use HTML::Template;
my @stories;
while (<DATA>) {
chomp;
my $xml = get($_) or next;
my $rss = XML::RSS->new;
eval { $rss->parse($xml) }; next if $@;
for my $item (@{$rss->{'items'}}) {
push @stories, {
FEED_NAME => $rss->channel->{'title'},
FEED_URL => $rss->channel->{'link'},
STORY_NAME => $item->{'title'},
STORY_URL => $item->{'link'},
STORY_DESC => $item->{'description'},
STORY_DATE => $item->{'dc'}->{'date'}
}
}
}
my $template = HTML::Template->new(filename => "aggregator.tmpl");
$template->param( STORIES => [
sort {$b->{STORY_DATE} cmp $a->{STORY_DATE} } @stories
] );
$template->param( TIME => scalar localtime );
delete $_->{STORY_DATE} for @stories;
print "Content-Type: text/html\n\n", $template->output;
__DATA__
http://blog.simon-cozens.org/blosxom.cgi/xml
http://slashdot.org/slashdot.rss
http://use.perl.org/perl-news-short.rdf
http://www.theregister.co.uk/tonys/ slashdot.rdf
http://www.oreillynet.com/~rael/index.rss
We need to delete the STORY_DATE once we've used it for ordering, as HTML::Template gets irate if we have loop variables that we don't use in our template.
Plug this into a CGI-enabled web server, and, lo and behold, we have a cheap and cheerful Amphetadesk clone.