Sunday, February 22, 2009

Open Source Business Intelligence

We had a great February meeting of the SDForum Business Intelligence SIG where C0-Chair Paul O'Rorke spoke on "BI on a Budget - Open Source BI". Paul's talk broke down into two parts. In the first half he talked about Open Source Licenses and to a lesser extent business models. In the second half he did a survey of Open Source Business Intelligence tools and platforms.

My discussion of licenses must include the "I Am Not A Lawyer" (IANAL) disclaimer. Paul is not a lawyer either as he told us before launching into his license discussion. Open Source licenses generally fall into three categories. From a business point of view, the most restricting is the GNU General Public License (GPL). Any code that links with GPL licenses code must be released as Open Source. This is called Copyleft, a play on Copyright. The least restrictive licenses are those like the BSD License and the Apache Software License that require little other than you acknowledge that you are using their software. In the middle sits the so called "weak Copyleft" licenses like the Mozilla Public License and the Eclipse Public License.

A good example of what these licenses mean in practice is found by looking at the two leading Open Source database systems PostgreSQL and MySQL. PostgreSQL was originally developed at the University of Berkeley under the leadership of Michael Stonebreaker and is released under the unrestrictive BSD License. Because of this, it is the basis of many recent commercial database systems including Netezza, GreenPlumb and ParAccell.

On the other hand, MySQL is released by a company that makes money by selling software licenses. There is a "community" version of MySQL that is licensed under the GPL. However you can also buy a license for MySQL, in which case you are not required to release any of your code to Open Source. A cynic might say that the community edition is there for promotion. You get to try the product for free but when you go to use it, you find that there are good reasons to buy a license.

Open Source license issues are complex. The above discussion is just the tip of the iceberg when it comes to understanding the implications of an open source license. Paul brought up plenty of other issues that need to be considered. One example is patent protection. This covers whether you are liable to be sued for using Open Source code, as well as issues of protecting your own intellectual property if your code is linked to Open Source code.

One of the good things about the meeting was its interactive nature. Sandeep Giri of the OpenI project gave us some insights as to why chose the Mozilla Public License for his project. When Paul got to discussing the BIRT Open Source Reporting project that is sponsored by Actuate, Suzanne Hoffman told us that the decision to Open Source some of Actuate's code caused dissent within the company to the extent that some people left. Jim Porzak had some input on the R programming language and helped us understand the difference between using a statistical programming language and a Data Mining system like Weka. Many other audience members also joined in.

No comments: