Implementing RSS for your custom CMS

Syndicating your website may seem like a daunting task, but it really isn't. For example, for this website I implemented a fully-validating, dynamic RSS feed in about 10 minutes with no previous experience in the matter.
Syndicating your website may seem like a daunting task, but it really isn't. For example, for this website I implemented a full-validating, dynamic RSS feed in about 10 minutes with no previous experience in the matter.
The first step in building an RSS feed is to find out what an RSS feed looks like in its raw format and to adapt that to your needs. RSS is a standard based on the XML format which allows people to develop ways to read updated content without actually visiting the website.
<?xml version="1.0"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title></title><link></link><description></description><language></language><generator></generator><managingEditor></managingEditor>& ;nbsp;<webMaster></webMaster><atom:link href="" /><item><title></title><link></link><guid></guid><pubDate></pubDate><description></description></item></channel></rss>
The structure of the page is relatively simple, though there is special syntax required for RSS readers to know that is in fact RSS and how to read it.
The first line simply states that the feed complies to the 1.0 XML standard. I won't go in depth into XML but you can learn more about it at w3schools. Suffice to say that the XML standard is pretty strict, and if you so much as leave out a slash (/) it can leave your feed unusable to some readers.
The second line states that it is in fact an RSS feed and conforms to the 2.0 standard. Don't worry to much about the rest of the line, but 'atom' is another standard that allows extra features in RSS feeds.
So the general structure of the feed happens inside a channel. The channel has its own information, just as its name (title), a URL to the home page, a general description, what the feed was made with, and the name of the editor and webmaster. The Atom link is the URL to the feed itself.
Inside the channel you find the items, or posts/articles/documents, that form the content of your CMS. Each item has specific information related to it, such as a title, permanent link, unique identifier so readers can tell of individual items have been updated, the date that it was published, and the content of the item.
Here is the same feed but with some dummy information:
<?xml version="1.0"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Syndicating your site</title><link>http://www.dummy.com/</link><description>A site for learning about RSS feeds.</description><language>en-us</language><generator>Ben's Feedster</generator><managingEditor>me@here.com (Me)</managingEditor><webMaster>you@there.com (You)</webMaster><atom:link href="http://www.dummy.com/rss" /><item><title>My first post</title><link>http://www.dummy.com/post.php?id=1</link><guid>1</guid><pubDate>Wed, 02 Oct 2002 15:00:00 +1000</pubDate><description><![CDATA[ This is the content of the item ]]></description></item></channel></rss>
Much of the information speaks for itself, however a couple of the fields need to be formatted a certain way. For example for language it's best to use 'en-us' for english. You will need to find other language codes on your own if required. The email address fields don't need to include your name in parenthesis, however it is standard to do so.
As for the GUID, there is no set standard on how it is formatted. In this example I simply used the unique ID assigned to my article, however you can use whatever method you want, so long as it is unique. You can even use the same information as the link if you want.
The other field that needs to be formatted correctly is the published date, which must be in the RFC-822 specification. As you can see from the example it's pretty easy to understand. The very last number, however, is the timezone modification, which you may need to find out for your area first.
And finally, you may notice that I have used a CDATA tag around my content. This is an extra precaution designed to allow information inside the content of the item which may otherwise be parsed by the RSS reader or XML parser. One thing you may want to do, to ensure your feed is readable by all readers that support the RSS 2.0 specification, is to strip any HTML tags out of your content, and converting any HTML entities (i.e. ), before outputting it.
So, it's pretty easy to see where the information for your own CMS needs to be substituted into this template. You iteration for posts obviously goes around the <item></item> sections, and if you're using PHP you can format the date correctly using 'r' as your date() format on a timestamp.
That's about it for syndicating your website. I know I didn't go into to much detail on the programming side of things because that will be different depending on which scripting language you're using and how you're intending to deal with the generated XML.
