Thursday, October 12, 2006

Spamhaus Case Threat

We have this vision of the Net as a the great level playing field where all the world can come on an equal basis. For some time, people outside the US have been concerned that the US has too much ownership and influence in the Net. Now the concerns are coming home to roost with the Spamhaus case.

The case is quite simple. Spamhaus.org is a volunteer run organization in London that issues lists of email spammers to ISP and other organizations around the world. A US business, e360Insight LLC sued Spamhaus in a Chicago court to have its name removed from the Spamhaus list of email spammers. Spamhaus has no money and considers that the Chicago court did not have jurisdiction over it, so it did not appear to defend their position. As they did not defend their position, they lost the case.

First the Judge issued a judgment of $11M against Spamhaus that has been ignored. Now the Judge has proposed an order that ICANN, a US based organization should take away the Spamhaus.org domain name from them. Without their domain name, people who use Spamhaus would not know where to go to get their email spammer lists.

The problem is that ICANN is a US organization and therefore subject to the vagaries of US justice which is somewhat variable in the lower courts. There has been good coverage of the Spamhaus case, but only today have people started to realize its long term consequences. I think we would all feel that the net would by much more neutral if its governing bodies were based in a neutral country like Switzerland.

Monday, October 09, 2006

Understanding Metadata

Some time ago I made the remark that "metadata is the cure for all problems digital". Well it is true, metadata IS the cure, although this statement is a bit like that old Computer Science saw that any problem in Computer Science can be solved by adding an extra level of indirection. The problem with Metadata is that it is one of these slippery concepts that seems so straightforward but then when you go to look into the details is difficult to pin down.

Ask a number of people what metadata is and you get a variety of answers. There are experts who will tell you that metadata is data about data and then go starry eyed as their mind gets lost in an infinite regression, while you wait unrequited wanting to hear something more useful and concrete. On the other hand there are people who will tell you that metadata is the ID3 tags found in MP3 files. Of course metadata is data about data, but that definition does not capture the essence of it, and while ID3 tags are metadata, there is a lot more useful metadata in an MP3 file than just the ID3 tags, let alone all the metadata in all the other data stores, data sources and file types that are available.

To get an understanding of metadata, a good starting point is to look at a diverse set of examples and then look at some of the important attributes and characteristics that are common to these examples. So, to kick this off, here is a description of metadata in a database, an XML file and a MP3 file. We will look at attributes and characteristics in later posts.

In a SQL database, the metadata is called the Catalog and this is presented as a set of tables like any other database tables. The catalog contains tables that define the tables, columns, views, permissions and so on. In practice the catalog is an external representation of the information that the database system needs to access its data. Internally a Catalog can be stored as a set of database tables or just some data structures, I have seen it done both ways. The Catalog is always presented as a set of tables so that the user can query the Catalog just like any other data. For example, I have fixed a bug by writing a mind-bendingly complicated query on a catalog rather than update the definition of the catalog table to get the required information easily.

In an XML document, the tags are the metadata. Well, except for the fact that tags can contain attributes and the value part of an attribute is data. Next we have to quench the argument about whether processing instructions are metadata by saying that some types of processing instruction are metadata and other types are not. Then there are DTDs and Schema that are metadata and also metadata about the metadata, (which is allowed by the definition that metadata is data about data). Some of the metadata can be in other XML documents that are referenced by URLs.

An MP3 file consists of a sequence of frames where each frame contains a header and data representing the sound. The header is metadata, containing useful information like the bit rate, frequency and whether the data is stereo. An ID3 tag is a frame with a header that indicates that the data is not a MP3 sound frame. The ID3 tag contains information about the artist, recording and album. There are several different versions of ID3 tags that are not upwardly compatible with one another.

Sunday, October 08, 2006

Glassbox

Being a Business Intelligence type, if I were given the job of devising a tool for analyzing Java applications, I would build a component to collect performance data and then present the user with a bunch of fancy reports, charts and data cubes where they could drill down to work out what their problem is. Glassbox takes a different approach as we heard at the SDForum Java SIG last Tuesday.

Glassbox collects the data from Java Apps running in an application server, analyses it for a set of common problems and produces a report that tells you exactly what the problem is. No complicated analysis, no slicing and dicing, just a simple report on the facts. Of course it can only tell you about problems that it has been programmed to analyze, however the common problems are well known. Things like too many database queries to answer web request, slow database response, slow complicated java that seems to call the same function too many times. It is the kind of solution that gets 90% of the problems with 10% of the effort of a full blown performance analysis tool. Another advantage of this approach is that the tool can be used by anyone without a lot of training or time spend getting experienced in its use.

While Glassbox has not wasted time building fancy displays, they have taken the trouble to collect their data in a straightforward and unobtrusive way. As we were shown, you just add a .war file to your application servers directory, update the application servers configuration, restart it and you are on your way. Supposedly data collection only adds 1% or so to program execution times.

All in all, Glassbox looks like a good place to start identifying problems with web apps. As it is Open Source, and easy to use, the cost of trying it out is low.

Wednesday, October 04, 2006

Definitive Source

When George Smoot was called at 3 am by someone with a Swedish accent telling him that he had won a Noble Prize for Physics, he thought this might be a prank. So what did he do? He went to the Nobel Prize web site to check whether it was true. We have come to the point where a Nobel laureate trusts the World Wide Web as the definitive source of information rather than a phone call.

Thursday, September 28, 2006

Dashboard Design

We had a lot of fun at the SDForum Business Intelligence SIG September meeting where Stephen Few spoke on "Why Most Dashboards Don't Work". Here we are talking about Information Dashboards that let an executive pilot their enterprise to new levels of performance. Stephen is an expert on the visual presentation of information, he has just published a book on Dashboard design and he has spoken previously to the BI SIG.

The fun came from looking at examples of dashboards that had been culled from the web and picking holes in what they presented. In practice it was surprisingly easy for audience members to find problems in the dashboards shown. From these examples and some critical thinking, Stephen pulled out a list of 13 things to avoid in dashboard design and a shorter list of things to do to get a dashboard design right.

However the thing that I found most compelling about the presentation came right at the end. Stephen had judged a dashboard design competition for DMReview and he showed us some of the entries. Then he showed us a dashboard that he would have entered had he not been a judge. Of all the dashboards presented, this was the one that showed us a great deal of information in a small space and discretely guided us to the information that most required our attention.

If you want judge, you can download a version of the presentation from the Business Intelligence SIG's Yahoo web site. We had to cut the size of the file down to make it fit. You can also buy Stephen's book on dashboard design. I highly recommend it.

Monday, September 11, 2006

The Latest HP Mess

There is a lot of talk in the valley about the latest HP boardroom brouha. It seems like not a year goes by without some new HP management upset. These upsets seem all the worse for the high regard in which the company was held. If you are upset by what seems to have become of such a great company, let me set the record straight.

Firstly, remember that the great company founded by Bill Hewlett and Dave Packard is now called Agilent. While Agilent seems to have lost some of the Hewlett-Packard way, it is not as bad as what has happened to HP, the fat child spun out of the original company several years ago. HP, the computer company, started out life as couple of divisions out of 20 that lost their way.

Part of the Hewlett-Packard way is that divisions grow organically and then split when they reach a certain size. This way no division dominates, they operate as a set of peers. The Computer and Printer divisions eschewed this tradition by just growing until they were big enough to swallow the rest of the company. Worse, the Computer division gave up on organic growth and for much of the last 20 years has been growing by acquisition.

All these acquisitions, particularly the large ones have diluted the blood to the point where we can on longer see a trace of the founding principals (pun intended). So do not feel sorry for HP, it is not the company you thought it was, it is just another big dinosaur well on its way to extinction.

Thursday, August 31, 2006

Free as in Peer

Laurence Lessig's writes an interesting column Wired magazine. His latest entry is titled "Free as in Beer". The column starts off talking about free, or more accurately Open Source beer. It is a good read however towards the end there is the following discontinuous comment:
Although peer production is profitable for business, writes Benkler, "we are in the midst of a quite basic transformation in how we perceive the world around us and how we act, alone and in concert with others." What he calls nonmarket peer production is a critical part of this transformation. (sic)
Beer Peer? We all know that to appreciate beer you need to open the source, and that after appreciating beer you become a pee'r, but it is not the knd of thing that needs to be talked about.

Thursday, August 17, 2006

The BIRT Strategy

Software is fascinating stuff. Compared to any other engineered product it is completely ephemeral, yet at the same time it is becoming the thing that makes almost every engineered product work. Software also has a meme-like quality where certain software systems become the standard that everyone gravitates to use, however good or bad it eventually turn out to be. It seems that the trick to creating successful software is to make it really, really successful.

I got to thinking about this after listening to Paul Clenahan, VP of Product Management at Actuate Corporation and member of the Eclipse BIRT Project Management Committee talk on "Eclipse BIRT: The Open Source Reporting Framework" at the SDForum Business Intelligence SIG. BIRT is a component of the Open Source Eclipse project that provides a Business Intelligence Reporting Tool (hence BIRT).

BIRT consists of an Eclipse plug in that allows you to design sophisticated reports, a standards based XML definition of the report and delivery mechanisms that allow you to deliver reports as either HTML or PDF documents. As Paul mentioned several times, it is also very extensible, so if it does not have the capabilities that you need, you can easily add them. BIRT is Open Source software that is available under the relatively unencumbered Eclipse public license that allows commercial exploitation of the code.

From the presentation and demo, BIRT seems to be a well designed, easy to use and fully capable reporting system that is free. In fact, as the presentation wore on, the one question in my mind was why Actuate has devoted 8 developers to developing this wonderful new Open Source reporting system. What is in it for Actuate? I think that it has to do with broadening the marketplace.

While reporting tools are widely used, many more developers roll their own reports rather than use a reporting tool.
Paul mentioned in his presentation that he asked a large group of developers at a conference whether they used reporting tools and the vast majority did not. Providing an easy to use Open Source tool that fits into the popular Eclipse development environment brings developers into the reporting tool fold.

Reporting tools are not rocket science. Low cost reporting tools have been around for a long time. While Actuate has excellent reporting tools, their core differentiating competence is a scalable platform for delivering reports, something that other reporting tools do not have. So broadening the market for reporting tools also widens the market for report delivery tools. If they are successful and BIRT catches on in that meme-like way, Actuate will have a much larger market to sell their products into. Open Source and an Eclipse plug-in dramatically lowers the barriers to using these tools.

Wednesday, August 16, 2006

Blogger Upgrade

Blogger, home of this blog is going to get an upgrade. I have used Blogger for the last couple of years as it suits my text mostly blogging style. So far my only complaint has been about Google's segregated indexing (and a spell checker that seems to work against both Blogger and Google). Many others have been less patient.

The bad news is that the new Blogger will be integrated with Google Accounts. Recently, I wrote about how I had been forced to give up my identity to Yahoo. Now, it looks like Larry and Sergey are going to get a bit of my identity as well. Barry Diller has a piece of me and Rupert has all my kids nailed down in their own little spaces. Whatever happened to freedom?

Tuesday, August 08, 2006

A Short Post

Everyone including the media is talking about the idea of the long tail. To me it seems like last years idea. On the other hand, this is the silly season so maybe that is all they have to write about.