Building an RSS File

In the previous article we discussed how to read an RSS file with PHP. In this article we will focus on the theoretical aspects of how to build an RSS file.

What is RSS?

RSS is an acronym for Really Simple Syndication. It is used to provide information about your website to the world. There is not really any difference between an XML document and an RSS document; in fact most people would agree that RSS is an XML dialect. All RSS documents must conform to the XML specification, which is published on the W3C website. We have several different versions of RSS formats. Let me explain why.

RSS was first invented by Netscape. They wanted to use an XML format to distribute news, stories and information for their My Netscape Portal back in the mid 1990s. For some reason Netscape lost interest in RSS and abandoned it just as it was becoming popular, and a company called Userland started to develop it for use in its products. As the format became more popular, the question of ownership of the RSS format became a problem when both Netscape and Userland claimed it as their own. To cut a long story short, we now have several versions of the RSS format, developed by various companies and individuals.

There are many RSS variations available today, and most RSS readers can still read the earliest, version 0.91. The latest version is 2.0, and can also be read by most RSS readers. There are two kinds of RSS documents. There’s what I call the simple kind and the enhanced kind. The enhanced version of an RSS document includes, in addition to the required elements, the following optional elements:

Table 1.

Element

Description

Example

Managing Editor

Email address for person responsible for editorial content.

jd@mysite.com (John Doe)

WebMaster

Email address for person responsible for technical issues relating to the channel.

jd@mysite.com (John Doe)

Copyright

Copyright notice of the content in the channel.

Copyright 2006, JD Site

Language

Language used in the channel. Allows aggregators to group articles by language.

En-us

pubdate

Publication date of the content in the channel.

Fri, 05 Sep 2005 00:00:001 GMT

LastBuildDate

The date of the last time the content changed.

Fri, 05 Sep 2005 00:00:001 GMT

Generator

The name of program that generated the document.

JD RSS Content Builder

docs

A URL that points to a site that would give information about the format used to create this RSS document.

www.wc3.com

image

Specifies an image that can be displayed with the channel.

Myimage.gif

clouds

Allows processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds.

<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure="pingMe" protocol="soap"/>

ttl

ttl stands for time to live. It’s a number of minutes that indicates how long a channel can be cached before refreshing from the source.

<ttl>40</ttl>

rating

The PIC rating for the channel.

 

textinput

Specifies a text input box that can be displayed with the channel. Further details:

A channel may optionally contain a  
<textInput> sub-element, which contains four required sub-elements:

<title> — The label of the Submit button in the text input area.

<description> — Explains the text input area.

<name> — The name of the text object in the text input area.

<link> —
The URL of the CGI script that processes text input requests.

 

skipHours

A hint for aggregators telling them which hours they can skip. Further details:

An XML element that contains up to 24 <hour> sub-elements whose value is a number between 0 and 23, representing a time in GMT, when aggregators, if they support the feature, may not read the channel on hours listed in the skipHours element.

The hour beginning at midnight is hour zero.

 

For further info on optional elements please visit: http://blogs.law.harvard.edu/tech/rss

Let me say a few words about the above optional elements. First of all, these are not all the optional elements that are available; I’ve just picked the ones I thought were most relevant to my article. If you want a list of all of them you should visit one of the many websites devoted to RSS document creation.

Secondly, ALL dates must conform to the RFC 822 specification as in the examples in Table 1.

All these specifications and rules were created because in the past, developers of RSS readers found that their readers could not read all RSS documents, because everybody created RSS documents as they wished. So a common approach to RSS document formatting was agreed upon, with a minimum standard to enable any RSS reader to read any RSS document.

{mospagebreak title=Required Elements}

Table 2.

Element

Description

Example

title

The name of the channel. It’s how people refer to your service. If you have an HTML website that contains the same information as your RSS file, the title of your channel should be the same as the title of your website.

mysite.com Tech News

link

The URL to the HTML website corresponding to the channel.

http://www.mysite.com/

description       

Phrase or sentence describing the channel.

The latest tech news from mysite.com.

Simple RSS Document Structure:

Let’s look at an example of a "Simple"RSS Document:

<?xml version="1.0" ?>
<rss version="0.91">
  <channel>
    <title>RSS Tutorial </title>
    <link>http://www.linktothestory.com</link>
    <description> The RSS Tutorial</description>

    <item>
      <title>RSS Syndication </title>
      <link>http://www.linktothestory.com</link>
      <description>Blah,Blah…</description>
    </item>

    <item>
      <title>Technology in Crisis??</title>
      <link>http://www.linktothestory.com</link>
      <description>The problem with technology in Schools</description>
    </item>

  </channel>
</rss>

 

As you can see all RSS Documents start with a "<?xml version?>" tag and end with a "</rss>" tag. This is also where you include the encoding type of the document. Without the <rss> tags, this will just be a bog standard, XML document. Immediately after the opening RSS tag, the channel is opened by the <channel> tag, after which a title, link and a description of your channel content is given. All of the elements are required; if even one of these elements is missing, your document will not pass a validator’s test and most RSS readers will not be able to parse your content. Table 2 describes the three individual tags that comes immediately after the opening channel tag.

Now the <item> tags are at the heart of the document. They contain the link, title, and description fields that describe your news stories or articles that you want the world to know about. This is what every RSS reader will be looking for when it reads your document. An item is a story or headline (it can be anything you want really) that contains a Title and link to the story and a description of the story itself.

{mospagebreak title=Enhanced RSS Document Structure}

Here’s an example:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>The Programmers Site</title>
    <link>http://www.thelinker.com</link>
    <description>think big</description>
    <pubDate>Wed, 05 Apr 2006
23:58:37</pubDate>
   
<lastBuildDate>Wed, 05 Apr 2006
23:58:38</lastBuildDate>
   
<language>en-us</language>
   
<copyright>Copyright 2006, TheSite</copyright>
   
<webMaster>ad@k.com (John Doe)</webMaster>
   
<managingEditor>ad@l.com (John Doe)</managingEditor>
   
<generator>RSS Builder V1.0 2006(c)</generator>
   
<docs>http://blogs.law.harvard.edu/tech/rss</docs>

    <image>
     
<title>Site Image</title>
     
<link>http://www.thelinker.com</link>
     
<url>http://k.com/thelink.gif</url>
     
<width>20</width>
     
<height>20</height>
   
</image>

    <item>
      <title>The title of the story here</title>
      <link>the link to the story here</link>
      <description>The description here
      </description>
    </item>
  </channel>
</rss>

The section in bold is the only difference between a simple version and a enhanced version. The enhanced version makes more information about the creator of the RSS document available to the RSS reader and aggregators. None of this really matters as far as an RSS reader is concerned, as long as the required elements are included.

Take a careful look at the image tag. It has three required tags: title, link and url, and two optional tags: width and height. Table 3 below explains how these tags are used and what they are for.

Table 3:

Element

Description

Example

title

describes the image; it’s used in the ALT attribute of the HTML <img> tag when the channel is rendered in HTML.

RSS Tutorial

link

is the URL of the site; when the channel is rendered, the image is a link to the site. (Note, in practice the image <title> and <link> should have the same value as the channel’s <title> and <link>.

http://www.mysite.com/

url

is the URL of a GIF, JPEG or PNG image that represents the channel.

http://www.mysite.com/theimage.gif

width

Width of the image. Maximum value for width is 144, default value is 88.

44

Height

Height of the image. Maximum value for height is 400, default value is 31.

45

{mospagebreak title=Aggregators} 

One of the many ways in which you can make your content more widely available to many readers is to submit your RSS document to an RSS aggregator. An RSS aggregator acts like a "newspaper" in that it constantly checks for new content from all of the submitted RSS documents and then publishes that content. In some aggregators you can even set the intervals at which the aggregator should check for new content. An aggregator can be an online site or it can be a desktop-based application. Below is a screen shot of an online aggregator:

To summarize, an RSS document is a summarized version of your website that you make available on the Internet for the purposes of attracting traffic to your website. You can build a RSS document in two ways, a simple version that includes only the required elements, and an enhanced version that includes the required elements as well as additional elements that give more detailed information about the document, such as when it was created and by whom.

Conclusion

In the final installment of the RSS reader article, we will be creating an RSS reader application that will be able to both read and create RSS documents.

Google+ Comments

Google+ Comments