Monday, May 26, 2008

Pull Dressed as Push

We have been watching with some degree of schadenfreude the problems that Twitter, the incredibly popular microblogging service, has with scaling, or even providing a reliable service. Yesterday Steve Gillmor suggested in his TechCrunch post that the problem has been caused by FriendFeed. FriendFeed is a new social service aggregator that either enhances or engulfs Twitter depending on your point of view. Here is my take on what the problem is.

First some background. Publish and Subscribe (pub/sub) is the underlying goal of all these services. I, as a client subscribe to something, and when the publisher has something that matches my subscription, they Push it to me. This is efficient because stuff is only sent to me when it exists. The problem is that the publisher may not know where I am when they want to do the Push. So many pub/sub systems work on the Pull by Polling model. That is, every so often I ask the publisher if they have anything new for me. The Polling part is that I repeatedly ask for new stuff and the Pull is that when new stuff exists, I Pull it from the publisher. This works reasonable well as long as I do not poll the publisher too often.

For example, RSS works this way (as I discussed some time ago). To prevent the original publisher being overwhelmed by requests for new information, part of the RSS protocol describes how often someone may poll the publisher and not overwhelm the publisher with too many requests.

Now back to Twitter and FriendFeed. Twitter provides an API so that other services can be built upon it. FriendFeed is an aggregator of social networking services that uses the Twitter API to aggregate information for its users. The Twitter API is based on XMPP, which is a high performance API for instant messaging that supports, for example, instant messaging between large service providers such as AIM and Yahoo Instant Messaging. However XMPP also has a low performance option based on HTTP for polling XMPP servers. This turns XMPP into Pull dressed as Push, which strains the servers when the poll rate is too high.

It turns out that the Twitter XMPP API is based on the low performance HTTP option. Thus FriendFeed is polling Twitter for each of its users, and polling frequently to give the appearance of instant response, which may be the reason that the Twitter servers are overloaded. Twitter has a feature in their API to throttle polling to no more than once a minute, however this could also be a problem if it is badly implemented.

By way of disclosure, I do not use Twitter or many of these other toys. I get quite enough information overload from RSS.

No comments: