Thursday, January 25, 2007

Visualization for All

If you are into playing with data these are good times. A number of web sites that have sprung up recently that allow you explore data visually. Data360 launched in October. Swivel got a mention from the influential TechCrunch blog. Many Eyes comes from IBM Research, although you may not think so from looking at the site.

Each of these web sites allows you to upload data sets and play with how they are presented, looking for insights into the data. Of the three, Many Eyes is the most approachable. Without having to register, you can play with data sets that others have uploaded. Many Eyes has a great collection visualization tools including scatterplots, stacked graphs and treemaps as well as the more mundane bar and pie charts.

For example, when I first visited Many Eyes, someone had uploaded a data set of restaurant reviews from the San Francisco Chronicle that scored the restaurants by food, atmosphere, service, price, noise and also gave an overall score. I looked at scatterplots and determined that there was no significant correlation between atmosphere and noise and that the only factor that seemed to show some correlation with the overall score is the score for food. I also used stacked graphs to explore US government spending over the last 45 years. The takeaway is that spending on health, particularly medicare and drugs accounts for the largest increase in spending.

The only problem with Many Eyes is that as befits a open research site, anyone can upload their data set, so by the time you read this the restaurant review data may have scrolled off to be replaced by other compelling data. Go and play with whatever data is there anyway. It will be a learning experience.

Thanks to Stephen Few and his great Visual Business Intelligence blog at PerceptualEdge for pointing out these sites.

Tuesday, January 23, 2007

DRM Wishes

The old saying goes "be careful what you wish for, lest it come true". The music industry wished for DRM to protect their content. They found their "white knight" in Steve Jobs who built the iTunes music store to deliver their content safely to iPod users everywhere. The problem is that the music industry now finds itself completely beholden to Apple as their only viable channel for digital music sales.

Apple controls the channel and dictates the terms for music sales, particularly the $.99 price which record executives want to vary. Also, the DRM is now seen to do more good for Apple then the music industry because it locks the music purchaser into Apple products. The more music bought, the more locked in the purchaser becomes. No wonder the music industry is now talking about selling music without DRM. Funnily enough Apple is against selling music on iTunes without DRM!

The only cloud on the horizon is that several European countries are trying to force Apple to open up their DRM for others to use. If these countries succeed, they take away the pressure on the music industry to sell music unencumbered. I view these countries efforts as totally misguided and I wish that they would just stop meddling.

On another front, the Jury is still out on whether the Microsoft Vista operating system is going to be so wrapped up in DRM that it is unusable. (I posted on this a couple of years ago.) There is a great discussion of Vista DRM on the Security Now podcast (episodes 73, 74 and 75).

Many people are surprised that Microsoft has yielded without a whimper to the content industry. If Microsoft had been willing to take a stand they could have negotiated a much better position for themselves and their products. It seems like Ballmer has been too willing to BOGU for the content providers. We will just have to stand back and see if he gets shafted.

Sunday, January 21, 2007

Complex Event Processing

Complex Event Processing (CEP) was the topic for the SDForum Business Intelligence SIG January meeting. Mark Tsimelzon, President, CTO and Founder of Coral8 spoke on "Drinking from a Fire Hose: the Why's and How's of Complex Event Processing".

Mark started out by showing us a long list of applications such as, RFID, financial securities, e-commerce, telecom and computer network security that share the same characteristics. Each of these applications can generate hundreds of thousands of event per second that need to be processed, filtered and have critical events identified and responded to in a millisecond or second timeframe.

The first response to building a system for one of these complex event processing applications is to load the data into a database and continuously run queries against the data. Unfortunately this introduces a number of delays that interfere with response time. Firstly there is the delay in loading the data into the database, as efficient database loading works best in batches. Next there is a delay in waiting for the query to be run as it is run periodically. Finally there is a delay caused by interference between the load process that is writing data and the query process that is trying to read the same data.

Given the problem of using a database, the next response to building a CEP system is to write a custom program in Java or C to do the job. This can be coded to meet the response time and data rate requirements, however it is inflexible. Any change to the requirements or data streams requires recoding and testing which take time and money. Coral8 and other vendors in the CEP space provide a system like a database that is programmable in a high level SQL-like language and that can process event streams at a rate similar to the hand coded system.

In a conventional database system, the data is at rest in the database and the queries act on the data. In a CEP system, the queries are static and the event data streams past the queries. When an event triggers a query, the query typically generate new event data. This structure allows event data processing to be parallelized by having several event processors that run different queries in parallel on the same data stream. Processing can be pipelined by having the output streams of one event processor feed into the inputs of another event processor.

It is important to understand that the purpose of a CEP system is not to store data. While events can linger, they eventually pass out of the system and are gone. A database complements a CEP system. For example, Coral8 can read data from database systems and even caches the data for improved efficiency. Also, output streams from Coral8 can, and usually are, fed into database systems.

If you want to try out CEP, visit the Coral8 web-site. There you can download documentation and a trial version of the software.

Sunday, January 14, 2007

Tableau Software

Business intelligence is about taking business data and turning it into actionable information, and there is a visualization problem at the heart of this process. Business data can be complicated and the user needs help in presenting the information in the best possible way. Unfortunately, many leading Business Intelligence tools seem to be deliberately designed to lead the user into making the worst possible presentation choices.

At previous meetings of the SDForum Business Intelligence SIG we have had great fun looking bad visualizations such as garishly colored 3-D pie charts and 3-D bar graphs that do more to obscure the information than to show it off. At the November meeting of the SIG we heard from a company that is doing something positive about data visualization when Kelly Wright, Director of Sales for Tableau Software, and a Bay area local, presented "Visual Analysis Using Tableau Software".

Tableau Software (www.tableausoftware.com) is a startup that emerged from a research project at Stanford University. There under the leadership of Dr. Pat Hanrahan a team of researchers worked on the difficult problem of enabling people to easily see and understand the information in their databases. As Kelly explained Tableau was formed in 2000 and took 5 years to develop their product, coming out with their first version in 2005. They are now on version 2.1.

Kelly gave us a whirlwind tour of Tableau's capabilities. Firstly Tableau is designed to understand the data that it is presenting, at least to the extent that it can make sensible choices about how to present the data in a useful way, for example, by giving line graphs of continuous data against time. While it is always possible to override the default, Tableau seems to do a good job with its choices. The next issue is being able to present large amounts of data and compare different aspects of the data against one another, and again the Tableau drag and drop interface seems intuitive and easy to use. When you can see all the data the next requirement is to drill down into the interesting data and remove the noise, and again Tableau has a set of tools for selecting the most interesting data points and looking into them further.

In retrospect it seems obvious to take the knowledge that has developed around how to present information, and package it into a data visualization product. However this is not as simple as it seems and the fact that Tableau took 5 years to develop their product shows the amount of work involved in doing this properly. Also, theirs is a lonely path. The other BI vendors prefer to provide flash and features over carefully integrated substance.

TableauĂ‚’s product is not expensive for a data-head, and if you ask, you can get a 10 day free trial to find out exactly what it can do. Go ahead and try it!