Sunday, October 21, 2007

OLAP with Mondrian

I got three things out of Julian Hyde's talk the the SDForum SIG last week. Julian is the lead developer on the Open Source Mondrian project and his talk was entitled "Building scalable OLAP applications with Mondrian and Pentaho".

The first thing is the nature of Mondrian. While most OLAP servers are database like servers that persistently store the OLAP cubes, Mondrian is a cache. At the back end, Mondrian uses JDBC to fetch persistent data from a database. At the front end Mondrian accepts queries in the MDX language and replies to these queries with data cubes. In the middle, Mondrian caches data cubes in memory so that it can respond to queries faster. Mondrian is a component. It is packaged as a single Java .jar file, and its schema is defined by an XML file. Mondrian needs both a back end database and a front end, either something like JPivot that can display OLAP cubes or an XML/A driver that can communicate MDX queries with a remote front end.

This brings us to the second thing. While there have been several attempts to create a standard interface to OLAP servers, none have really succeeded. Julian has spearheaded a new Open Source initiative called olap4j. The concept is that olap4j will be the JDBC of OLAP. It is based on JDBC, and uses the MDX language to express OLAP queries. MDX is the universal language of OLAP, just as SQL is the universal language of relational databases. The only curiosity is MDX is owned by Microsoft and that other implementations of MDX like Mondrian have to scramble to match each Microsoft release. I think that it is about time that MDX is managed by an independent standards committee.

Finally Julian talked about a "real-time" feature that he has recently added to Mondrian. This is a cache control API that allows you to invalidate parts of the Mondrian cache when new data is loaded into the underlying database. Having invalidated part of the cache, new queries fetch the new data and display up to date results. Initially the example looked complicated, and the audience reacted with some skepticism. On looking back at the example in the presentation, it seems less forbidding, however the long term goal should be to automatically manage the OLAP cache from an ETL tool. When pressed, Julian told us that the API is being used, for example, in applications for securities analysis.

In all, it was an satisfyingly informative evening.

No comments: