Wednesday, August 03, 2005

Pauseless Garbage Collection

Programs generate a surprising amount of garbage, little pieces of memory that are used and then discarded. Low level programming languages like C require that the programmer manage the storage themselves, which is a surprisingly painful, time-consuming and error prone task. So these days application programs are written in languages like Java (or C#) where the system manages the storage and does the garbage collection. The result is much higher programmer productivity and much better and far more reliable programs.

The overhead of doing automatic garbage collection has always been a concern. However, another problem with automatic garbage collection is that up to now it has required that the system pauses, sometimes for a considerable length of time while parts of the garbage collector runs. A pause in a web application server stops customers from doing whatever they are trying to do. This can range from absolutely unacceptable in online stock trading to just very bad for customer satisfaction in a typical e-commerce application.

At the August meeting of the SDForum Java SIG, Cliff Click spoke on pauseless garbage collection. Cliff is part of Azul Systems, a startup that has developed an attached processor to run Java applications. As Azul Systems sells to large enterprises that run Java web applications to support their business, being able to do automatic garbage collection without pausing is an important feature.

Cliff is an engaging speaker who has spoken to the Java SIG before. Previously Cliff gave an overview of what Azul is doing. At this meeting, Cliff described the pauseless garbage collection algorithm in detail, and then went on to give us some indication of its performance. He had taken a part of the standard SPEC JBB Enterprise Java warehouse benchmark, modified it by adding a large slow-moving object cache and a much longer runtime that makes the benchmark more realistic and garbage collection more of an issue.

When the benchmark is run on an Azul system, the longest "Stop the world" pause is 25 milliseconds, whereas running the benchmark on other Java systems exhibited pauses of up to 5 seconds (yes seconds). On any platform almost all of the benchmark transactions run in under a millisecond. On the Azul system, no transaction took more than 26 milliseconds, which is very close to their maximum pause time, and well over 99% of the transactions ran in under 2 milliseconds. On the other Java systems, over half of the total transaction time could be taken up by transactions that took more than two milliseconds to complete.

While Cliff and Azul are proud of what they have done so far, they are not satisfied. So they are working on removing the last few vestiges of a pause from their system. We can expect even better performance in the future.

1 comment:

James McGovern said...

Thoughts on