Wednesday, February 25, 2009

Open Source Business Models

The previous post on Open Source reminds we that we have looked at Open Source business models several times in this blog. Here are a couple of case studies in why software became Open Source.

Apache Derby is an Open Source Java database with an interesting history. Cloudscape, was an early Bay area Java startup. Founded in 1996, Cloudscape came out with its first database product in 1997. In 1999 Informix, bought Cloudscape, and in 2001 IBM bought the database side of Informix, including Cloudscape. Under IBM, Cloudscape development continued, and it was used as an embedded database within several IBM Java products and middleware. In 2004 IBM contributed the code to the Apache Software Foundation as Derby where it has found acceptance.

IBM had acquired Cloudscape by accident when it had acquired Informix, and some time after that acquisition IBM had to decide what to do with Cloudscape. On the one hand, Cloudscape did not fit into IBM product hierarchy. They already had a mobile version of DB2, the database brand and did not need another one. Also Cloudscape did not generate enough revenue to support continued development or be an interesting business. On the other hand, Cloudscape had been adopted for use in many IBM middle tier Java projects. By many I mean at least 70 internal projects were using Cloudscape.

In the end, IBM decided to Open Source Cloudscape under the Apache organization with whom they already had a significant relationship. This solved several problems at once. Cloudscape no longer competed with the DB2 brand. A well established Open Source project would provide long term support for Cloudscape at less expense than doing it in house. A donation of valuable software would help with IBM's standing in the Open Source community and the Apache foundation with which it already had a significant relationship.

Eclipse BIRT (Business Intelligence and Reporting Tool) is an Open Source reporting application that is based on the Eclipse Java Open Source framework. I wrote about BIRT in 2006 about 18 months after the project was launched. In that post, I speculated on why Actuate had Open Sourced BIRT. Apart from the reasons that I gave then, all of which still stand, a couple more reasons come to mind.

On the positive side, an open source project could establish the BIRT and Actuate methodology for defining reports as a de facto standard. Standards create defense in depth from competition and and can provide great profitability for those who control them. On the other side, an strong Open Source project with a pliable license is the best possible way to suck the air out of the competitors lungs and prevent any new competitors from springing up.

Finally, releasing BIRT as open Source software has not harmed Actuate. Since the release, Actuate has not grown revenue significantly, but it has become profitable with strong cash flows. In its most recent quarter Actuate reported that BIRT contributed a respectable 12% of revenue.

In summary, there are a lot of reasons that software becomes Open Source. It can be a new home for old software that does not fit, or it can be weapon in the fight to the top of the business totem pole. Whatever reason, be sure you understand why the software is Open Source before committing to it.

Sunday, February 22, 2009

Open Source Business Intelligence

We had a great February meeting of the SDForum Business Intelligence SIG where C0-Chair Paul O'Rorke spoke on "BI on a Budget - Open Source BI". Paul's talk broke down into two parts. In the first half he talked about Open Source Licenses and to a lesser extent business models. In the second half he did a survey of Open Source Business Intelligence tools and platforms.

My discussion of licenses must include the "I Am Not A Lawyer" (IANAL) disclaimer. Paul is not a lawyer either as he told us before launching into his license discussion. Open Source licenses generally fall into three categories. From a business point of view, the most restricting is the GNU General Public License (GPL). Any code that links with GPL licenses code must be released as Open Source. This is called Copyleft, a play on Copyright. The least restrictive licenses are those like the BSD License and the Apache Software License that require little other than you acknowledge that you are using their software. In the middle sits the so called "weak Copyleft" licenses like the Mozilla Public License and the Eclipse Public License.

A good example of what these licenses mean in practice is found by looking at the two leading Open Source database systems PostgreSQL and MySQL. PostgreSQL was originally developed at the University of Berkeley under the leadership of Michael Stonebreaker and is released under the unrestrictive BSD License. Because of this, it is the basis of many recent commercial database systems including Netezza, GreenPlumb and ParAccell.

On the other hand, MySQL is released by a company that makes money by selling software licenses. There is a "community" version of MySQL that is licensed under the GPL. However you can also buy a license for MySQL, in which case you are not required to release any of your code to Open Source. A cynic might say that the community edition is there for promotion. You get to try the product for free but when you go to use it, you find that there are good reasons to buy a license.

Open Source license issues are complex. The above discussion is just the tip of the iceberg when it comes to understanding the implications of an open source license. Paul brought up plenty of other issues that need to be considered. One example is patent protection. This covers whether you are liable to be sued for using Open Source code, as well as issues of protecting your own intellectual property if your code is linked to Open Source code.

One of the good things about the meeting was its interactive nature. Sandeep Giri of the OpenI project gave us some insights as to why chose the Mozilla Public License for his project. When Paul got to discussing the BIRT Open Source Reporting project that is sponsored by Actuate, Suzanne Hoffman told us that the decision to Open Source some of Actuate's code caused dissent within the company to the extent that some people left. Jim Porzak had some input on the R programming language and helped us understand the difference between using a statistical programming language and a Data Mining system like Weka. Many other audience members also joined in.

Sunday, February 15, 2009

Do We Need a New Programming Language?

There is a plethora of new and emerging programming languages. Recently I have written about two: Scala and Groovy.

In many ways this era feels like the 1980's when we had a similar profusion of new programming languages. Then Niklaus Wirth seemed to come out with another new language every other year. Apart from those, there were plenty of other new languages to argue about. At the time, the language wars got so intense that most people retreated behind the shelter of a simple acronym. Their response was "JAFPL" (Just Another F-word Programming Language), where the F-word could be "Friendly" if you just wanted to dodge an excruciating time-wasting controversy.

In practice, Wirth's languages from the 80's along with many others have not survived. The reason is quite simple, these languages did not do anything more than other stronger languages of the time. Now we have the same situation. The new languages address the same problem space as better established languages. Why should we move to a new dynamic language like Groovy when we already have Perl, Python and Ruby? Likewise, is there anything of real value that Scala does that cannot already be done by Java?

In my opinion, there are two important issues in programming that current programming languages do not address well. They are:
  1. Concurrency
  2. Persistence
The popular programming languages handle concurrency with Threads which is an awful solution. I will rant another time about why Threads are so awful, but trust me they are awful, awful, awful and completely disastrously awful. Persistence is an area where we have a complete impedance mismatch between the programming languages and relational database persistence mechanism. There are all sorts of magic system that try to bridge the gap but they all have their own problems.

To breakout, the next big programming language needs to do something new, different and more than existing programming languages do. There are plenty of new problems to solve, concurrency and persistence seems like the most likely problem areas to address.

Thursday, February 12, 2009

Musician versus Recording Industry

I just happened to notice this quote in an interview with The Reverend Horton Heat in the San Jose Mercury News:

"To me, being a recording artist is barely a valid art form," Heath says by telephone from a tour stop in Fort Collins, Colo. "It's almost like being in the advertising business, because in the long, storied history of music, only a small percentage of that history involves recording technology.

"Music was always a live event. It's a linear art form. You play some notes, they go out and they're gone forever. To try to reduce that to a static art form is wrong."

I agree with this entirely for many reasons. Firstly, it fits in well with the new reality of zero cost replication of recorded music. As many people have said, myself included, the new business of music is to give away recorded music to promote live performance.

Secondly, while listening to recorded music is OK, nothing beats a good live musical performance. I could give many examples of memorable musical performances that I have witnessed in person. A good recording of a good live performance beats a sterile studio performance any day. For example, many, many years ago I happened to tape Ian Dury and the Blockheads performing live for the BBC. That performance was so much more alive than any of their studio albums. When I buy music these days, I prefer to buy a live performance.

Finally, as the good "Reverend" says, in the history of music, the recording is but a blip. The recording industry came to power in just the last 50 years by controlling the means of production and now that they no longer have that control they will surely fade.

Sunday, February 01, 2009

Groovy and the MetaObject Protocol

Bill Grosso spoke to the January meeting of the SDForum Java SIG on the Groovy programming language. An important part of the discussion was the Metaobject Protocol. I will get to that after explaining what Groovy is. Also at the meeting, Chris Richardson spoke on GORM (Grails Object Relational Mapping) an object relational mapping layer for Groovy programs. I will discuss this at another time.

A couple of months ago I posted on the Scala programming language. Like Scala, Groovy has a Java like syntax and is compiled to the Java virtual machine. That means that it meets the Java criteria of compile once, run anywhere. Also it makes the full set of Java libraries available to the programmer. Unlike Scala, Groovy is a Dynamic language. While there is some discussion as to exactly what a Dynamic programming language is, a core principal is that in a Dynamic language data type is a property of a data item, whereas in a Strongly Typed language like Scala, the data type is a property of the variable that holds the data item.

In general, it is easier to start writing a program in a dynamic language as you do not need to make so many decisions up front, the program gets written more quickly, you can test out ideas and bits of the program interactively and the program tends to be smaller. On the other hand, large programming projects, particularly with many programmers, maintainability and refactoring suffers when a dynamic language is used. I have more to say about dynamic languages at another time.

In his talk, Bill argued that another advantage of a Dynamic language is that it allows for the Metaobject Protocol. Like anything else with meta in its name, the Metaobject Protocol concept is a little hard to pin down. I think of it as the mechanisms in a programming language that allow you to build frameworks, generalized algorithms and such stuff. Bill's argument is that the ability to build these things within the language is necessary and a good thing, and that the mutability of a dynamic language's type system is required to do it. After all the name Metaobject Protocol comes from Lisp, the most mutable of programming languages (apart from assembler).

I disagree with the last part of the argument. For example, the old joke goes Q: what happens when you put more than one java programmer in a room? A: you get a framework! Java is a strongly typed language and the object of more frameworks than all other programming languages put together. In fact I am always amazed at how Java, which is a relatively small language as well as being strongly typed can be used to build such extensible systems.

There is more to my objection than that. Sometimes boundaries on what you can do are a good thing, and a well structured programming language creates boundaries. During the talk, Bill gave as an example of the Metaobject Protocol, changing the inheritance rules for an object. Inheritance, particularly multiple inheritance, is a slippery and potentially ambiguous problem at the best of times and the ability to change inheritance on the fly is frankly scary. My first reaction to any project where any kind of inheritance rules are even an issue would mean that the class structure is far to complicated. Programmers should strive to create deep and complicated data structures rather than deep and complicated class structures.

To summarize, I believe that strongly typed languages are just as capable as dynamic languages for building the kinds of tools and frameworks as the Metaobject Protocol. Moreover, strong, clear types and rules prevent us from getting lost in a semantic swamp that can arise when our language system allows us to change things on the fly. Apart from that, Groovy seems like an nice enough language, and it is probably as good as the many other dynamic languages that are around now. Bill did mention that performance is noticeably slower than Java, as might be expected for a Dynamic language running on a platform optimized for the other type of language.