Wednesday, April 08, 2009

Ruby versus Scala

An interesting spat has recently emerged in the long running story of Programming Language wars. The Twitter team, who had long been exclusively a Ruby on Rails house, came out with the "shock" revelation that they were converting part of their back end code to use the Scala programming language. Several Ruby zealots immediately jumped in saying that the Twitter crew obviously did not know what they were doing because they had decided to turn their backs on Ruby.

I must confess to being amused by the frothing at the mouth from the Ruby defenders, but rather than laughing, lets take a calm look at the arguments. The Twitter developers are still using Ruby on Rails for its intended purpose of running a web site. However they are also developing back-end server software and have chosen the Scala programming language for that effort.

The Twitter crew offer a number of reasons for choosing a strongly typed language. Firstly, dynamic languages are not very good for implementing the kind of long running processes that you find in a server. I have experience with writing servers in C, C++ and Java. In all these languages there are problems with memory leaks that cause the memory footprint to grow over the days, weeks or months that the server is running. Getting rid of memory leaks is tedious and painful, but absolutely necessary. Even the smallest memory leak will be a problem with heavy usage and if you have a memory leak, the only cure is stopping and restarting the server. Note that garbage collection does not do away with memory leaks, it just changes the nature of the problem. Dynamic languages are designed for rapid implementation and hide the boring details. One detail that is missing is control over memory usage and memory usage left on its own tends to leak.

Another issue is concurrency. Server software needs to exploit concurrency, particularly now in the era of multi-core hardware. Dynamic languages have problems with concurrency. There are a bunch of issues, too many to discuss here. Sufficient to say that in the past Guido van Rossum has prominently argued against putting threads into Python, another dynamic language, and both Python and Ruby implementations suffer from poor thread performance.

A third issue is type safety. As the Twitter crew say, they found themselves building their own type manager into their server code. In a statically typed language, the type management is done at compile time, making the code more efficient and automatically eliminating the potential for a large class of bugs.

Related to this, many people commented on the revelation that the Twitter Ruby server code was full of calls to the Ruby kind_of method. It is normally considered bad form to have to use kind_of or its equivalent in other languages like the Java instanceof operator. After a few moments thought I understood what the kind_of code is for. If you look at any real server like a database server's code, it is full of assert statements. The idea is that if you are going to fail, you should fail as fast as you can and let the error management and recovery system get you out of trouble. Failing fast reduces the likelihood that the error will propagate and cause real damage like corrupting persistent data. Also with a fast fail it is easier to figure out why the error occurred. In a language with dynamic typing, checking parameters with a kind_of method is the first type of assert to put in any server code.

So the Twitter developers have opted to use Ruby on Rails for their web server and Scala for their server code. In the old days we would have said "horses for courses" and everyone would have nodded their heads in understanding. Nowadays , nobody goes racing, so nobody knows what the phrase means. Can anyone suggest a more up to date expression?


Paul O'Rorke said...

thanks for this article. I'm not sure I follow the third paragraph, though. it seems to imply that dynamic languages are more likely to leak memory and that statically typed languages provide more control over memory. I think dynamic versus static typing is an orthogonal issue. I think the absence or presence of automatic garbage collection is more important. But there are different kinds of memory leakage and as you mentioned even with a garbage collector (as in Java) memory can still leak. The issues there have more to do with whether objects end up getting collected when they aren't really being used anymore, I think. Does static typing give you more control over leakage (e.g., in Java)?

Richard Taylor said...

I am only giving a summary of the argument. The Twitter crew said "Ruby, like many scripting languages, has trouble being an environment for long lived processes". One thread, I think from Slashdot, was that CRuby is well known to leak memory, and that they should have used JRuby instead. Others countered that JRuby lacked needed features.

My experience with server code is that memory is always a problem. We have just spent a lot more time than it should have taken with what appears to be a memory leak in JNI code.