Sunday, March 19, 2006

Deconstructing RSS

If RSS stands for "Really Simple Syndication" how come the explanations of it on the web are so complicated and so useless. There are people out there pulling their hair out trying to understand RSS and they don't get it because they don't get how really simple it is.

An RSS 'feed' is nothing more than a web page. The only difference between RSS 'feeds' and other web pages is that web pages are coded in HTML and RSS 'feeds' are coded in another dialect of XML. The content of a RSS 'feed' page is a set of items where each item contains text and links just like any other web page.

A problem is that the promoters of RSS engage in obfuscation, trying to make out that RSS is something more than it really is. They talk about the feed as if it were a push technology. The Wikipedia page on RSS even has a link to push technology. However, like everything else on the web, RSS 'feeds' are fetched by a 'feed' aggregator using HTTP GET. Thus RSS 'feeds' are in reality pull technology, just like any other type of web browsing.

This brings us to the feed aggregator. The normal way to render XML is to provide a style sheet that turns it into XHTML. Unfortunately, there seems to be problems with RSS which prevent this, so you need to have this special thing called an aggregator before you can read a 'feed'. From what I can tell, the first problem is that there are many different dialects of RSS, which vary just enough so that a single style sheet does not work nicely for them all. A second problem is that one RSS dialect wrap the 'feed' in a Resource Descriptor Framework (RDF) tag. The RDF tag indicates that this is metadata and in general metadata is not rendered, so you need the aggregator to strip off the RDF tags before its content can be rendered.

Another thing that an aggregator can do is display several RSS feeds on the same page. When you leave the Bizarro world of RSS pull 'feeds', this feature is known as a portal and each display on the page is known as a portlet. Portals are usually done on the server side, while aggregators more often work client side, but apart from this small difference they pretty much the same thing.

The final thing that an aggregator can do is keep track of which items you have seen in a 'feed' and only show you new and updated items. Exactly how this should work is not stated, and when someone asks, the question is ignored. Doing it properly would require a discipline in generating the feed that is not required by the spec and thus cannot be relied on by the aggregator. In practice there is a convention (unstated as far as I can tell) that items are ordered in the feed page from most recent first. The aggregator renders the page normally and you see as many of the recent items as your portlet for the feed has room to show.

Summarizing, a RSS 'feed' is a web page in a slightly different dialect of XML and an aggregator is a client side portal. Is there anything that I have left out?

No comments: