Build and Break: 2007

Tuesday, December 11, 2007

Viva Fake Steve

I have just finished reading "Option$, the secret life of steve jobs" by Fake Steve Jobs. The good news is that it is an extremely funny book. I laughed out loud every few pages. The last book that had me laughing out loud in the same way was "Thank You for Smoking" by Christopher Buckley and that was well over a decade ago. Not quite so good is that while the story arc is firmly in place, the ending is a bit weak. However, a weak ending does not diminish the joy leading up to it. Well recommended.

Wednesday, December 05, 2007

Aggregate Data, Make Money

In the old days, the aggregate business was about selling large numbers of small stones for building roads and such, now the aggregate business is about collecting large numbers of data points and aggregating them into information valuable enough to support a business. We heard about one such business at the November/December Meeting of the SDForum Business Intelligence SIG where Faisal Mushtaq, CTO, Biz 360 spoke on "Do you know what customers are saying about you and your products?".

Biz360 starts as a Web version of a clippings service. They gather media articles, blog entries and other documents that reference their clients from the web and other information services. As the information is in digital format, there are opportunities for interesting analyzes that cannot be done with dead tree clippings. For example, Biz360 automatically ascribes a sentiment, negative or various grades of positive to each document so that the business can track what people think about it. They also ascribe a reach to each document that measures its importance. This captures the fact that an article in the Wall Street Journal will be seen by and influence more people than an article in say the San Jose Mercury News.

Once the data is gathered, a client can look at it in many different ways. The starting point is usually dashboard that shows reach and sentiment over a period of time. The client can look at changes in reach and sentiment and drill down to find out what lies behind them. Newspaper articles cite sources and blogs are linked so the software can derive and show the network of influence between related items.

Faisal showed us a specific example of how attention to sentiment could have helped avert a PR problem. In 2006, Dell had a problem with exploding batteries in their laptop computers. Although the story did not break in the newspapers until August, there had been a large discussion of the problem in the blogosphere for months beforehand. Faisal showed us a convincing graph showing that by the time the mainstream news media picked up on the problem, it was already old hat in the blogosphere.

Doing this kind of analysis requires some serious computing power. Biz360 has a 150 CPU processor farm with 25 Terabytes of disk storage arranged as a computing GRID. They download about 1 million media articles a day and 4 million blog posts. From this they derive roughly 200 thousand articles of interest to their clients and apply 7 million automated analyses to these articles.

Friday, November 23, 2007

OLAP Vendors Capitulate

I did some research on where the Business Intelligence market was heading and came to the conclusion that there has been complete capitulation in the marketplace. The table to the right shows the market share of the top 10 OLAP vendors in 2006 (courtesy of The OLAP Report).

In 2007 the following events took place. Oracle bought Hyperion. Cognos bought Applix. Business Objects bought Cartesis. SAP announced that it is buying Business Objects. IBM announced that it is buying Cognos.

In essence, the top 10 OLAP vendors which included several important Independent Software Vendors (ISV) is being replaced by the top 4 enterprise software vendors: IBM, Microsoft, Oracle and SAP.

There are two independent vendors left in the list and neither is going to break out. Microstrategy is in the ROLAP niche, suitable for big gnarly problems that need a lot of money. Microstrategy could make a good fit with the newly independent Teradata another company in a similar niche. Infor is a private company that has been rolling up second tier software vendors in the same way that Computer Associates did in the 80's and 90's. It is the new "home for old software".

Wednesday, November 07, 2007

Fake Steve Jobs, Tech Journalist

Someone in the audience put it best when he said that he uses "The Secret Diary of Steve Jobs" as his primary source of tech news. This was at the Computer History Museum last Tuesday evening when Dan Lyons AKA Fake Steve Jobs (FSJ) appeared on stage with former Apple Evangelist Guy Kawasaki, and Brad Stone the journalist who outed him. In the meeting, Dan Lyons seemed to be a little confused that anyone would think that his column is a useful commentary on tech events, but I know what that audience member means.

I too use FSJ as a primary source of my tech news and commentary. While the blog is intended as a satire, Dan Lyons is a knowledgeable tech writer and he gets the details right. Moreover, the format means that he can be perfectly frank and even rudely negative as he often is. I get more information out of 200 word rant from FSJ on a subject such as Facebook valuation that I get out of a perfectly balanced 1500 page article in the NYT or WSJ. What is more, a spoonful of humor sugar helps the medicine go down.

There is the question of bias. Truth be told, I find it much easier to deal with information when the bias is clear and on the surface rather than cleverly buried as it is in the 'balanced' articles written by Journalists. Moreover, Journalists need to make nice with their subjects so that they will continue to get access in the future, so they tend to backpedal on the negative, or at least bury it in the n'teenth paragraph. On the other hand, FSJ in character says what he thinks, and the truth comes straight out.

The event itself was well worth attending. Dan is a very funny guy and he had us laughing for nearly two hours, mocking everyone from the Appletards in the front row to Megan McCarthy of Vallywag at the back. There were at least 200 in the audience, including several notables such as well known tech writer John Markoff and iJustine.

Monday, November 05, 2007

A Developer Notebook

For some time I have wanted to write technical posts on specific programming and programming languages topics. While I could post them in this blog, they would not really fit with the lighter hearted and wider ranging posts here. So I started a new blog called "A Developer Notebook" for the technical posts. When you look at the content, there should be no mistaking the goals and purposes of these two blogs. Enjoy!

Thursday, October 25, 2007

The Facebook Platform Internals

Ari Steinberg and Charlie Cheever of Facebook gave a fascinating talk about the design of the Facebook platform when they spoke on Tuesday to the SDForum Web Services SIG. As might be expected, the presentation drew a large audience including renowned Facebook fanboy Dave Maclure.

The Facebook platform allows outside developers to build applications and offer them to Facebook users. These applications can reach into Facebook data to do their work so security and privacy concerns are paramount. Also with Facebook approaching 50 million users and reportedly 85,000 applications, performance and scaling are big concerns.

Ari spoke first on the Facebook API. After looking at other Web 2.0 APIs and finding them wanting, they decided to use a query language approach which would give applications much greater flexibility and at the same time have the possibility of being more efficient by only fetching data that is needed. The language, FBQL is basically a simplified and restricted version of SQL.

The linguistic approach is also used in Facebook Markup Language (FBML) the language that applications and users use to define and customize their pages. As Charlie told us, it is much easier to validate and sanitize markup and style sheets from a parse tree than with regular expressions or any other technique. Also the result is much less likely to have the security loopholes that seem to plague so many Web 2.0 sites.

Sunday, October 21, 2007

OLAP with Mondrian

I got three things out of Julian Hyde's talk the the SDForum SIG last week. Julian is the lead developer on the Open Source Mondrian project and his talk was entitled "Building scalable OLAP applications with Mondrian and Pentaho".

The first thing is the nature of Mondrian. While most OLAP servers are database like servers that persistently store the OLAP cubes, Mondrian is a cache. At the back end, Mondrian uses JDBC to fetch persistent data from a database. At the front end Mondrian accepts queries in the MDX language and replies to these queries with data cubes. In the middle, Mondrian caches data cubes in memory so that it can respond to queries faster. Mondrian is a component. It is packaged as a single Java .jar file, and its schema is defined by an XML file. Mondrian needs both a back end database and a front end, either something like JPivot that can display OLAP cubes or an XML/A driver that can communicate MDX queries with a remote front end.

This brings us to the second thing. While there have been several attempts to create a standard interface to OLAP servers, none have really succeeded. Julian has spearheaded a new Open Source initiative called olap4j. The concept is that olap4j will be the JDBC of OLAP. It is based on JDBC, and uses the MDX language to express OLAP queries. MDX is the universal language of OLAP, just as SQL is the universal language of relational databases. The only curiosity is MDX is owned by Microsoft and that other implementations of MDX like Mondrian have to scramble to match each Microsoft release. I think that it is about time that MDX is managed by an independent standards committee.

Finally Julian talked about a "real-time" feature that he has recently added to Mondrian. This is a cache control API that allows you to invalidate parts of the Mondrian cache when new data is loaded into the underlying database. Having invalidated part of the cache, new queries fetch the new data and display up to date results. Initially the example looked complicated, and the audience reacted with some skepticism. On looking back at the example in the presentation, it seems less forbidding, however the long term goal should be to automatically manage the OLAP cache from an ETL tool. When pressed, Julian told us that the API is being used, for example, in applications for securities analysis.

In all, it was an satisfyingly informative evening.

Thursday, October 04, 2007

The Future of Music Revisited

Michael Arrington has a great post in TechCrunch on "The Inevitable March of Recorded Music Towards Free". Most interesting is both the large response and the highly negative nature of some of the responses. I wrote about this subject under the title of "The Future of Music" a couple of years ago, partly inspired by an article in Wired about MySpace.

I completely agree with Michael Arrington, and I am surprised at the reaction of some performers. I expect that a performer would want to escape the clutches of their record company, who seems to feel that it is their right to keep 90% of their artists earnings. The fact that groups like Radiohead are escaping from their contract and planned to go free shows the way forward.

It is worth remembering that the Recording business is relatively recent. Big recording companies have only existed for 60 to 70 years. They came into their position of power by controlling the means of production, first records and then CDs. Now that they are no longer needed to reproduce music, they are irrelevant and should just fade away.

Sunday, September 30, 2007

Software Architecture: Profession and Skills

Paul Preiss, president of the International Association of Software Architects (IASA) spoke on "Software Architecture: Profession and Skills" at the SDForum SAM SIG September meeting. Paul founded the IASA as a professional association for Information Technology (IT) Architects 4 years ago and since then it has grown to almost 7000 members in 50 countries.

Paul started off by talking about his career and how through a series of jobs at different companies he came to understand that the role of IT Architect is often unacknowledged and unappreciated. Also came the realization that IT Architect is an emerging profession and that the profession needs a professional body. One pressing reason for having a professional body is that he believes that the profession will be regulated within the next 5 years and that practitioners should have a say in writing the legislation.

The kernel of talk was about defining what an IT Architect is and does. We all understand the profession of being a Lawyer or Doctor or Accountant, thus we need a similar simple understanding of IT Architect. Paul attacked the problem from several different directions, eventually coming up with two words: Technology Strategy. Just as the Chief Financial Officer is in charge of directing the financial strategy of a business, the Chief Architect is in charge of directing the Technology Strategy. In particular, the role of the architect it to deploy just enough software to enable a competitive advantage for the business. I have been rolling this around in my mind for a couple of days and it seems to make sense.

The IT Architect has to at least understand a business problem and know the technologies that are available to solve it. The choice of the right technologies is not always easy. Technologies are changing fast, resources are limited and implementations do not always succeed. Moreover, other players may have their own agendas. For example, developers may want to use some new technology so that they can put it on their resume. Paul gave us a specific example. He came into a situation where the business was already using several object relational mapping tools and the developers wanted to adopt yet another one for a new project.

Paul ended his talk by plugging the IT Regional Conference in San Diego in October. Everyone who is involved in IT Architecture should check out the conference and the IASA

Sunday, September 23, 2007

The Evolution of Web Analytics

We have not had a talk on Web Analytics for many years at the SDForum Business Intelligence SIG, so it was great to hear Stephen Oachs, founder and CTO of VisiStat, speak on "The Evolution of Web Analytics" at our September meeting. VisiStat is a two year old start-up that provides a web site performance measurement and analytics service (Software as a Service model) in the Small and Medium Business market.

As Stephen told us, first generation web analytics was about collecting data from web logs, integrating that with data from other sources and presenting historical results to IT specialists. The current generation, which he called web site performance management, collects data by page tagging, which entails adding a small snippet of JavaScript to each page. In practice the code snippet is added to a common page header or footer so it only needs to be added once to cover all pages in a site.

Page tagging collects more information than can be extracted from web server logs and it does not require difficult integration to make sense of the data. With better analysis software, the results of page tagging are ready to show directly to end users like the marketing and sales people who are responsible for the contents of the web site. Also, with page tagging we can see the data in real time, which allows the following of a user as they browse around the web site.

Real time access to information opens new doors. Stephen told us about a specific case where a bank became aware that it was subject to a phishing attack on its customers when the bank noticed an unusual change in the patterns of access to their web site. Similarly, click fraud can be detected by unusually high bounce rates from specific a key word. If detected in time the click fraud may be subverted by changing the price for the specific keyword. Finally, real time data provides web site availability monitoring, an additional service that for example, VisiStat offers for free.

The evening ended with a demo of the VisiStat product by Tina Bean, VisiStat Director of Sales and Marketing. The demo involved logging into real live customer web sites. You can see much of the same thing by visiting the VisiStat web site and looking at their live demo. VisiStat is a powerful tool for understanding how a web site is being used. At the same time we could see that it is designed from the end user perspective, so that typical small business user can use it effectively without needing support from an IT department or consultant. Most impressive is the fact that all the power of VisiStat is available to a small web sites for as little as $20 a month.

Tuesday, September 11, 2007

iPhones in the Bargain Bin

The big technology news of last week was the iPhone price reduction from $600 to $400 (plus tax). Pundits were crawling out of the woodwork to voice their opinion. Negative sentiment drove Apple's stock price down a bit. Here is my view of what it means.

The iPhone is worth $600. We know that because almost a million of America's brightest and best were willing to pay that price for for it. Now you can own it for $400. What a bargain!

Sunday, September 02, 2007

Google Flight Simulator Earth

Last Year I wrote about Microsoft re-purposing their Flight Simulator program for their Live search engine. Now comes word that Google is fighting back by adding a flight simulator to Google Earth. While playing with a flight simulator can be fun, I still do not see what it has to do with search.

Sunday, August 26, 2007

Vista DRM and Performance

Six months ago there was a fuss when Peter Guttman published his article on the Microsoft's new operating system Vista and how its DRM implementation could disrupt computer system performance when DRMed media is being played. Now the crows are coming home to roost.

In the last week there have been reports that Vista networking performance drops to 5% of maximum when the music player is active and 10% of maximum when any visual media player is active. This appears to happen even although the media player is paused, and even on multi-core systems where you might expect other cores to pick up the load. For example:

I have a Q6600, and while experiencing this crappy network performance, all 4 cores are practically idle.

So what will be the conclusion of all this? Microsoft has had many products that have knocked it out of the ball park, but every company produces a duff product now and again. It looks like Windows Vista could become the Windows ME of the 21st century.

Wednesday, August 22, 2007

Search and Analytics

"Search and Analytics" was the topic at the SDForum Business Intelligence SIG yesterday evening where Sajjad Jaffer, co-founder of Mercatrix spoke. The presentation covered the search space describing the latest trends in search with the intent of convincing us that Search is not over. Although one company dominates internet search now this was not always the case and it need not be the case in the future.

There are hundreds of startups with available products that are working on improving search. Sajjad talked us through the usual suspects like media search, semantic search and the use of crowdsourcing to improve relevancy. He also pointed out some interesting analytic resources on the web like Google Trends and Yahoo! Buzz.

The meat of the talk came a the end where Sajjad argued that traditional Business Intelligence spends too much time looking inwards to operational data from the business. For example, if the analytics looked out to what others are saying about their products, the business may be able to react more quickly than wait until product returns and slow sales told them they had a problem with the product.

While I agree that the web can be used for analytics, I am skeptical because there are also some huge hurdles. One problem is that large search companies can show off aggregated results of their most popular queries as in Yahoo! Buzz but any attempt to provide more narrow results that would be of use to a particular business will run into privacy and other problems.

I may be proved wrong. From what I can can make out of Mercatrix, its business is to provide internet search based analytics to companies.

Thursday, August 09, 2007

Disruptive Technology?

The SDForum Emerging Tech SIG meeting last night on "eBay on Disruptive Technologies" did not turn our to be quite what what I expected, however it did contain a couple of interesting insights. Firstly Adam Trachtenberg, senior manager of Platform Evangelism, gave an overview of some of the things that eBay is doing to explore and protect itself from disruptive innovation. He described what they were doing with the wonderful word: "explorimenting".

An enthusiastic audience asked Adam about various of the companies like Skype, StubHub and Rent.com that eBay had recently bought and what their plans for them were. Unfortunately these groups are not in Adams bag and he could not say anything about them. However he did talk about a couple of initiatives that were in his bailiwick.

One of them is a social commerce effort that will allow people to share their eBay watch lists with their friends. The most interesting thing about this project is that it is built on Facebook. Last year Facebook turned their social networking site into a platform that allowed third party applications. As Adam explained, the Facebook platform makes it possible, even easy, for eBay to create social networking experiments like this. Recently there has been a lot of noise about the Facebook platform, and this experiment is another signpost as to how important it will become. The eBay Facebook application will launch in a couple of weeks.

Alan Lewis, a Technical Evangelist with the eBay Developers Program showed us the other eBay initiative a rich internet client application for eBay buyers called San Dimas. The idea is to explore giving eBay buyers a better user experience and also showcase the eBay API. The most interesting thing about this application is that it is written using the new Adobe Apollo/AIR/Flex platform. Adobe AIR is the next generation of Flash player, acquired when Adobe bought Macromedia in 2005. While AIR is still in beta, we have been hearing a lot about it recently, and I expect it to have a big impact when it launches. San Dimas has been in a private beta. We were the first people to see a public demonstration of it.

Monday, August 06, 2007

So Busted

In May I wrote about Fake Steve Jobs and the Vallywag campaign to unmask him. Then I remarked that given the Vallywag reputation for Bulldog journalism they would not get very far. Now I stand corrected.

On Sunday August 5th at 5:56 PST, Vallywag broke the news about how Forbes Editor Daniel Lyons is the Fake Steve Jobs, having read about it in the New York Times, like all the rest of us. Since then they have published 10 further posts on the subject in 24 hours as if quantity could make up for quality. In the mean time FSJ's put down of Vwag as Dr. Evil and Bigglesworth is much funnier than anything they have published in the last 6 months.

Monday, July 23, 2007

BI Software as a Service

Like many others, I went to the July meeting of the SDForum Business Intelligence SIG expecting to hear about Salesforce.com and its new AppExchange. In practice we heard about a lot more. The speaker, Darren Cunningham, had just left Salesforce.com and joined LucidEra, a new start up that specializes in Business Intelligence as Software as a Service (SaaS). Darren also brought along Ken Rudin, the CEO of LucidEra who had some interesting things of his own to say. There are several write ups of the meeting on the web that capture the big picture. I want to note a couple of the smaller moments from the meeting.

Firstly, both Ken and Darren described building your own business intelligence system as being similar to a business building and running its own Nuclear Reactor. What they mean is that while a full blown data warehouse can be very powerful, it also requires a lot of maintenance and attention to detail to keep it running. In practice many businesses do not have the resources and energy to keep their business intelligence system running in the kind of peak condition that is required to get value from it.

Secondly, someone asked about LucidEra's pricing and business model. Darren explained that they are taking the same approach as Salesforce.com. That is, start with an offering for small and medium businesses and then build up to the larger businesses. After 8 years in business Salesforce.com is just now starting to sign up large businesses like major brokerage houses.

Ken had some interesting things to say about pricing. Unlike sales, a typical business has a few business analysts who would be the major users of their system, so pricing on a per seat model is difficult. Also they do not want a tax on reports, so the pricing model should allow a business to run as many reports as they want. Therefore LudcidEra uses a flat rate per application per customer. Currently they are using a single flat rate of $2900 for their starter application.

In my opinion, a flat rate encourages the customer to use as much of the service as they can which is a good thing, however it is optimal for a particular size of business. Smaller customers are shut out by a cost that is too high for them. Large customers could overwhelm the service without fully covering their costs. One way to expand the model is to provide tiered pricing based on the size of the customer. Tiers could be based on either customer total revenue or number of employees.

At the moment LucidEra is just starting to build their portfolio of customers and applications. They will have to suffer growing pains before their pricing model is put to its full test.

Wednesday, July 04, 2007

Design Patterns in Python

Alex Martelli is a leading light of the Python programming language community. He is a leader in the development of the language, author of Python in a Nutshell and has written extensively on Python in other books and articles. Last week he spoke to the SDForum Software Architecture and Modeling SIG on "Design Patterns in Python". Alex has posted the presentation slides here.

Design patterns are a useful concept in programming. A programming language consists of a set of basic constructs that are used in various ways to build a program. Design patterns capture higher level constructs that commonly appear in programs. As Alex explained, an important part of each design pattern is its name. Having a good and well known name for each pattern makes it easy to explain a program and discuss how it works at a higher level than the programming language constructs.

Although design patterns are supposed to be programming language agnostic, many of the books on design patterns are based on statically typed languages such as C++ and Java. There are some classic design patterns that do not translate well into dynamically types languages like Python. One example is the Singleton design pattern. This uses strong typing to ensures that there can be only one instance of a particular type of object.

Python is a dynamically typed language, and in these languages the type system is used to make sure that the right thing happens at run time rather than to do any enforcement. The consequence is that you cannot really do a singleton in Python or any other dynamically typed language. If I were in the dynamic language camp, I would argue that if you only want one instance of an object, make sure that you only create one instance of the object. This is clearly in the spirit of dynamic typing.

However Alex is clearly concerned by the absence of these design patterns from Python. First he claimed that these design patterns do not exist in Python because it is a higher level language and that the design patterns are somehow subsumed by the language system. Then he went on to show us his Python answer to the singleton design pattern, which he called the Borg. In the Borg, all instances of the class share the same state, and so they all appear to be the same instance.

While the Borg pattern works in Python, I do not see it a either necessary or interesting. Alex told us, Guido van Rossum, the father of Python, does not like it either.

Thursday, June 28, 2007

Relational Messaging

The SDForum Business Intelligence SIG was honored to have a new company launch at the meeting last week. Julian Hyde, well known for leading Mondrian, the Open Source OLAP project, gave a talk titled "A new solution to an old problem: How Relational Messaging solves Data and Application Integration". He told us that he had been working with a small group for the last 3 years to develop their streaming database product in a new company called SQLstream (no web site yet).

Streaming databases are nothing new to the Business Intelligence SIG. In 2004 we hosted Celequest, which was recently bought by Cognos. In 2005 we hosted iSpheres, which has subsequently disbanded. In January 2007 we hosted Coral8. The difference between SQLstream and these other companies is less one of technology and more one of vision.

Although there was interesting technology buried inside, Celequest presented their system as a way of creating real time dashboards. Unlike Celequest, iSpheres was concerned with performance, although they were weighed down by a propriety language. Coral8 is concerned with handling complexity and being able to scale when handling real time events that happen in the millisecond to second range.

SQLstream, on the other hand sees their streaming database as a system that can do be used to do any and all Enterprise data and application integration functions. The bulk of the presentation was about how SQLstream could supplant ETL, EAI, EII, SOA and some uses of database systems amongst others. Anyone who has any experience in this area will know that this is a huge claim. I am not going to say whether they are right or wrong, only time will tell us that, however I am going to admire the vision.

There is one other admirable thing in SQLstream and that is a commitment to the SQL standard. Streaming databases requires some some extensions to SQL. SQLstream has some extensions to the SQL standard however they are few and necessary. Apart from these extensions their language is pure SQL.

(The presentation is currently available as a file on our Yahoo web site.)

UPDATE: 12/11/07 SQLStream now have their web site up.

Sunday, June 24, 2007

Virtual Goods

I did not have the time to attend the recent conference on Virtual Goods at Stanford, however the write ups did pique my interest and raise a couple of questions.

Firstly, where do Virtual Goods end? For example, is a downloaded music file a virtual good? I can see that there is a distinction between an MP3 file I have downloaded to my home system and a virtual good that only exists on a server for a virtual world, for example land in Second Life or gold in World of Warcraft. However, there are DRM schemes for music where I cannot use a music file that I have downloaded unless my system is in contact or has recently contacted the server through the internet.

It is fine to define virtual goods as digital goods that are under the control of some other entity, but we need a definition. (I put in the word digital because there are slightly more tangible goods that I own but that are kept under the control of others. A good example is in the stock market, where when I buy shares in a company, the share certificate is held by my broker.)

Secondly, how should we value Virtual Goods? As Susan Wu says in her introduction to the conference, in general we can value virtual goods like any other goods based on their utility. However there are some special considerations. A base value for any goods is its marginal cost of production. The marginal cost of producing virtual goods is zero, so virtual goods can become worthless. For example, if I spent dollars buying gold in the World of Warcraft and Blizzard Entertainment ceases business, my gold has zero value.

Now I do not expect Blizzard to go out of business any time soon. Anyway, if I did buy gold, I would not spend more that I would on a meal in an expensive restaurant. Also, I would immediately spend the gold on something useful like a Vorpal Sword of Bunny-Smashing and go out and kill some bunnies or Orcs or whatever to get value from the money I had spent.

A much more interesting case is presented by Second Life. The company is a private startup without the transparency of Blizzard Entertainment and the controversial business model is less proven. The things that you buy are virtual land and virtual adornments, things that you might expect to keep and cherish in the real world, yet which have a more fleeting existence in the virtual world. While one person has made a huge business success out of Second Life, others have tested it and found it wanting.

We will just have to wait and see what happens to Second Life. In the mean time, I put virtual goods in the same category as fancy food, flowers and 'gifts'. That is something to consume and savor as they are being consumed.

Thursday, June 14, 2007

Programming Language Wars

I have been doing a lot of programming recently, using a wide variety of programming languages. In the last year I have written programs or modules in the following programming languages: C, C++, Java, Perl, PHP, Python, Shell and various dialects of SQL. Tiobe Software maintains an interesting Community Index that tracks programming language popularity. From their data I calculate that I am using 65% of current programming languages (by volume).

For anyone new to the game, Programming Language Wars have a long and storied history. As soon as the first programming was implemented, there were wars about whether to use it or assembler. Shortly thereafter there were two programming languages and the battle commenced with earnest. For example, this article remembers some of the more ridiculous wars from the 70's and 80's. Newbies, get over it, we are at least on the 423 "Great Programming Language War".

I am going to write a series of posts on programming languages that I hope will provide a more balanced view that the usual blast.

Saturday, June 09, 2007

Video 2.0

The best way to find out about something new is to try and do it. Robert X. Cringely illustrated this in an unusual post this week. The post is unusual because he normally comments in his acerbic way on what others in the tech industry are doing. This post is on his own technology adventure.

Cringely has put together a series of video tech interviews and he wanted to provide a way for video feedback. I have done some video, but until I read read the post, I had never considered the possibility of video as a response. Cringely provides an example in the comments of a You Tube video that absolutely demands video feedback and gets it. (That Pachelbel sure knew how to write a bass line, even in the 17th century.)

In other video sightings from around the web, the normally staid TechCrunch blog has a wonderful video mashup on the Jesus phone.

Wednesday, May 30, 2007

FSJ Rules

Not sure quite why, but "The Secret Diary of Steve Jobs" is my favorite read on the internet. Every day there is at least one entry that makes me laugh out loud. Today, "Goatberg" asked RSJ (Real Steve Jobs) whether he reads FSJ (Fake Steve Jobs). RSJ replied that he had read a few of the FSJ things recently and thought them pretty funny.

Currently Vallywag has a hot campaign to unmask the identity of FSJ. In the past, others have tried and failed. Given Vallywag's record for bulldog journalism I do not expect them to get anywhere either. Their latest candidate misses the mark on several fronts. From reading between the lines, I think that that FSJ is of Steve's generation and has a strong British expat connection as several posts contain cultural references to Private Eye, the British satirical magazine.

Anyhoo, if you want a good laugh, hear Bono on how he is going to get the World Bank job.

Monday, May 28, 2007

Distributed Analytics

Ever since the meeting, I have been trying to get my mind around the presentation by John Mark Agosta to the SDForum Business Intelligence SIG on "Distributed Bayesian Worm Detection". The topic was about how a network of computers working together could be more effective at detecting a computer worm by exchanging information, described as gossiping amongst their neighbors. It is part of Intel's research on Autonomic Enterprise Security.

The idea is that after a worm has taken over one system in a network, it will try to take over other systems in that network by sending out messages to those systems. Basic worm detection is done by detecting unusual activity in the outgoing messages from a host system. In a distributed worm detection system, each host in the network making a decision as to whether it is infected by a worm and then exchanges that information with other systems in the network. Overall the network determines whether it has been attacked by a worm through a distributed algorithm.

The most interesting claim that John made was that the detectors on each host could be set to have a low threshold. That is they could be set to report a lot of false positives about being infected by a worm and yet in the distributed algorithm these would cancel out so that overall the system would only report a worm attack when one was actually under way.

I have considerable experience with distributed algorithms, and one thing that I know is that a distributed algorithm can be viewed as a sequential algorithm that is just being executed in parallel. Thus, the worm detection system can be viewed as a collection of noisy detectors that are then filtered by the next level of the algorithm to give a reasonable result in the aggregate. As such this could be a useful analytic technique. On the other hand, a worm attack is something that can only happen in a network, so perhaps the technique is only applicable to distributed systems.

Like a good proportion of the audience, I was interested in whether the analytic techniques described could have a broader application in other areas of Business Intelligence like fraud detection. So far I have not managed to come up with any convincing applications. Any suggestions?

Sunday, May 13, 2007

People Search Redux

Reading through my last post on People Search, I realize that I did not quite join up the dots. Here is how people search works. It is something I have written about before, and the Search SIG meeting did reveal some new angles.

Firstly, the search engine spiders the web collecting people related information. Next comes the difficult part, arranging the information into profiles where there is one profile per person. This is most difficult for common names like Richard Taylor. Then there are other little variations like nickname, for example, Dick, Rich, Ricky for Richard, spelling variations (Shakespear) and middle names or initials that may or may not be present.

A good profile linked to an identified user is a valuable thing. For example, it can be used to direct advertising to the desired demographic, making the advertising more valuable. As I have noted before this kind of information is most valuable to large internet companies like Yahoo, Google who effectively direct a large part of online advertising.

A profile is much more valuable if the person has taken control of their profile and effectively verified it. So the final step for the people search companies is to create enough awareness that people feel compelled to take control of their own profile. I have run across ZoomInfo profiles that have been verified, so they have started to do this for their specialized audience. Wink and Spock will have to try much harder. I looked at my Wink profile and I was not impressed. I have seen a scarily accurate profile of myself online, and Wink did not come close.

At the meeting, DJ Cline opined that people might be willing to pay money to have their profiles taken down. The panel of search company CEOs disagreed that this was a good model and told us that they would try to talk someone out of demanding that their profile is removed. I think that what this means is that a good profile is actually of more value to others than it is to the target of the profile. Quite apart from that is the thought that blackmail is a "difficult" business model.

Michael Arrington several times expressed the opinion that that Spock would be sued for what they were doing, particularly as one of the example profiles they showed was Bill Clinton with the tag "sex-scandal". I was concerned with a possibility that a profile could be hijacked, as the hijacker could then play tricks to embarrass the target of the profile. A high profile lawsuit or profile hijacking with a lot of attendant publicity could be the catalyst that brings people search to the public attention given what people search gains from an event that makes everyone go out and claim their profile.

However this gets us back to blackmail as a business model. It is one thing for Joe Average to create his own MySpace page. It is quite another thing if Joe Average feels that he has to go out and claim a profile page that someone else has put together without even asking him, just so that he can defend his own good name. People search has had a long history of privacy concerns and it will continue to do so.

Tuesday, May 08, 2007

Making Money from People Search

The first question that moderator Michael Arrington asked was how are you going to make money? This was at tonight's SDForum Search SIG meeting on "People Search", and he was asking the panel consisting of Michael Tanne CEO of Wink, Jaideep Singh CEO of Spock, and Bryan Burdick COO of ZoomInfo.

ZoomInfo, which has been around for a few years and is aimed at corporate and executive search claimed to be profitable, and generating cash through subscription revenue. Wink and Spock are aimed at the bigger and more general consumer markets and need to be supported to some extent by advertising. These two companies are not at the money making stage yet, Wink is live while Spock is in invitation-only beta. Also, they are not going to generate a lot of money from direct search advertising judging by the numbers that were bandied around.

I think that a general people search function is most useful as an adjunct to a large internet company like Google or Yahoo that generate much of their revenue from advertising. People search is used to get demographic information about the user which is then used to target the advertisement. At a Business Intelligence SIG meeting last summer we heard about how Yahoo is targeting banner ads based on demographics, and it is no coincidence that Yahoo already has its own people search feature.

The CEOs of Wink and Spock both protested that they were pure people search companies that could survive and grow as pure people search businesses. However there was a general murmur in the room that everyone has their price. We will just have to see how it plays out.

Wednesday, May 02, 2007

Save Time, Generate Code

I mentioned the after party in the previous post, now is time to talk about the SDForum SAM SIG meeting "Writing Code Generators For Quality, Productivity, and Fun" by Bill Venners of Artima. In short, Bill's talk was about designing domain specific languages and using code generation to solve programming problems. Code generation can build better, more reliable code faster, particularly in situations where you find yourself doing a lot of cut and paste style .

As befits the technique, Bill got the audience to do a lot of the lifting by describing how they had solved various programming problems through code generation. I have used it a couple of times as well as some real language design, and I shared one of my experiences along with several other audience members. It was definitely encouraging to hear that many audience members had successfully used code generation.

Bill characterized a number of situations where code generation is useful and showed us a specific example of a domain specific language that he had designed. From my experience language design is comparable to API design except that there are more degrees of freedom so there are more ways to go wrong. Designing a real programming language is hard. There are issues at the lexical level, problems with keywords and extensibility, the need to make the language regular and unsurprising while dealing with special case and a thousand other little details to get right.

On the other hand, if a domain specific language can be kept simple enough, it can skirt these problems. Bill's example language was simple. Each statement consisted of a set of keywords followed by a name. In my opinion, domain specific languages are a great technique provided that you firstly design a clean simple language and secondly remember the principals of API design.

Monday, April 30, 2007

Duh Typing

The meeting of the SDForum SAM SIG last week was interesting, however the discussion afterwards over a beer was just as interesting. I will write about the SIG meeting another time, here I want to capture an idea from the after session.

There is a lot going on in the world of programming languages, dynamic versus static languages, strong versus weak typing. Proponents of dynamic languages like Python and Perl claim that they are superior because you do not need to continually specify the type of an object, particularly when the type of the object is obvious.

Over a beer someone proposed that this continual restating of the type of an object should be called Duh Typing (a play on Duck Typing). I ran into an example of Duh Typing today when I found myself writing the following Java statement:

ManagedTasks[] managedTasks =
(ManagedTasks[]) tasks.toArray
(new ManagedTasks[tasks.size()]);

The dynamic language guys would laugh at the fact that I have to specify the type 3 times in a single statement. Well, it is not quite as easy all all that as I will discuss in future posts. In the mean time here is a neat name for the concept.

Friday, April 27, 2007

The Times, They Are a-Changing

"Apple’s Steve Jobs, perhaps the most important person in the music industry today, ..."

Well, it is a quote from Michael Arrington who is after all a Valley booster, however it does show how far the music business has come on its strange journey. Gone are the simple days of find and rip-off some gullible kids, now we are at the medium is the message.

Sunday, April 22, 2007

The 60 Hour Data Warehouse Implementation

There was a lot of interesting stuff in the presentation by Stephen Bay to the SDForum Business Intelligence SIG on "Large Scale Detection of Irregularities in Accounting Data". However the one thing that really struck me was their 60 hour data warehouse implementation.

Stephen and his colleagues at the PricewaterhouseCoopers Center for Advanced Research have built a system called Sherlock for detecting fraud in accounting data by applying several analytic techniques. Sherlock works by looking at the general ledger of the business. A general ledger is typically several gigabytes of data and may be fed by sub-ledgers that can run into the hundreds of gigabytes. Before Sherlock can do its analytics, they have to get the accounting data into a standard form in a data warehouse. Sherlock is used during an accounting audit which typically lasts a month, so there is great pressure to get the data warehouse implemented in as short a time as possible.

So how do they do it? Firstly, the schema of the data warehouse is fixed. The PricewaterhouseCoopers team have developed a standard data warehouse design for a general general ledger that is applicable to all non-financial businesses. The data warehouse design is open source and is available from IPHIX. Secondly, the general ledger data usually comes from an SAP, Oracle or PeopleSoft ERP system so some of the connections can be prebuilt. The problem with ERP systems is that they are heavily customized for each user, so the Sherlock team have implemented a GUI tool for building a mapping between ERP content and the data warehouse. The tool is designed for business people and accountants to use so that the data warehouse can be built by people with domain knowledge but no technical knowledge.

With all this they claim that they can build a data warehouse in 60 hours for a new implementation and 20 hours for a repeat implementation. Contrast 60 hours with a typical data warehouse project that takes many months. A large data warehouse project can easily take a year or two to implement. So a 60 hour implementation is an astonishing achievement.

Sunday, April 08, 2007

Music and the Limits to Consumption

Time and money are limits on the consumption of media. If I am playing a video game, or watching a video, I will not be listening to music. If I listen to less music, I will want to buy less music. Anyway, my budget is not unlimited, so if I spend money on a movie, TV Show or video games, I will not spend that money on buying music. I got to thinking this after reading a piece in the New York Times where a pair of record store owners complain that it is actions of the recording industry and RIAA that have lead to the recent decline in the sales of music.

There was a time when music was the choice media to purchase, but that was when the main alternative was books or magazines. Now we live in a much richer environment where that are many more media choices to entertain us. Music has lost its monopoly on our attention, however the music industry still behaves like it has that monopoly. Rather than market music to us, the RIAA goes out and sues the most ardent music collectors.

There are plenty of things that the recording industry could do to improve their position. For example, they should be marketing music to us as the more entertaining and lasting of the media alternatives. After all, we listen to the same music over and over again, while we usually watch an movie or TV show only once. I have never once heard a compelling argument for buying music being made by the music industry, whereas I have heard interesting arguments being made on forums like SlashDot.

Thursday, April 05, 2007

Cut and Paste

It is the little details in a User Interface that make the difference between something that is straightforward and something that is frustrating. For example, lets take the simplest editing function, cut and paste. If text appears on my screen, I want to be able to select the text and paste it into a document, message or whatever. However, there is a lot of text that appears on my screen that I cannot select. There are title bars, menu items and worst of all, all the interesting text in dialog boxes.

Imagine this scenario: a modal dialog box appears on my screen with an important error message. What do I do? Well I cannot select and copy the text, and with a truly modal dialog box, I cannot do anything else until I have dismissed the box, so if I want to preserve the message in the box, I have to find a piece of PAPER to write down what it says, and later transcribe my written notes into the email sent to support. I have been on the receiving end of software errors and so I know that the first question is "What EXACTLY does the message say?". So this is why you need to keep a piece of paper and pencil handy when you use a computer!

I was reminded of this problem today by another frustrating problem with cut and paste. A colleague IMed me to ask for an email address of a third party. As I had to send the third party an email as well, I brought up Outlook (ugg) typed the first three letters of the persons name and auto-completed. Feeling happy that everything could be done in a few keystrokes, I selected the email address in the To: part of the message, copied it and pasted into the IM window. Imagine my surprise when the name of the person appeared in the IM window but not their email address. I had copied the text "name <email>" and when I pasted all that appeared was "name". After 3 goes, I gave up and had to type the email address that appeared in one window into another. Who invented this feature, and what were they thinking?

On reflection, I suppose that I could have gone through the option and disabled all the ones that call themselves intelligent or smart. I did not and I will not. As I have said before in this blog, software should do the right thing out of the box. Anyway, life is too short to customize all the software tools that I use, especially as I am expected to upgrade to new and more complicated versions every few years.

Thursday, March 29, 2007

Watch This Space

Is there anyone else who did not recognise Pam Beesly on the front cover of Wired this month?

Sunday, March 25, 2007

CRM is Back

There were several interesting things that came out of Jacob Taylor's "CRM is Back" presentation to the SDForum Business Intelligence SIG last week. CRM stands for Customer Relationship Management, an umbrella term for the software and systems for managing sales, marketing and service, the customer facing aspects of a business. Jacob is CTO and co-founder of SugarCRM which is the leading Open Source CRM system. During the talk, he touched on the incredible success of SugarCRM, its relationship with Open Source and the use of the PHP language. This post is about the incredible success of SugarCRM. I will discuss SugarCRM's use of Open Source and PHP in future posts.

SugarCRM was formed in April 2004 and by July had come out with its first product SugarCRM Version 1, a sales CRM module. The company also received Venture Capital funding, the first Open Source software application vendor to do so. From the start SugarCRM generated a huge amount of interest, with as Jacob described many enthusiastic users who offered input, feedback and code to improve the product. By that October, SugarCRM had risen to become Project of the month on SourceForge, the leading repository of Open Source projects.

I recall from the time that SugarCRM had generated so much interest that there were the rumblings of a backlash amongst experienced CRM professionals. The old guard did not understand how this new product that was being made available in a new and low cost way could generate such excitement. However, the excitement is there. Since Sugar has launched there have been more than 3 million downloads of SugarCRM and related projects.

Jacob attributed their rapid success to a number of factors. The first advantage is in the three founders themselves. They have complementary skills in sales, service and engineering and they work well together. Secondly, the founders had a lot of experience with building CRM systems. Before forming SugarCRM they had all been at E.piphany which had been a major CRM system vendor in its day, and prior to that they all had experience with other CRM systems. Given their experience, they all knew exactly what was needed and this allowed them to put together an extremely capable CRM system in a very short period of time.

A third advantage is Open Source and the community that SugarCRM has built around their project. From the earliest days they had an enthusiastic user community who both used and contributed to the project. Sugar soon set up its own SugarForge to accommodate what are now more than 350 Open Source extensions to the project. The Sugar Forums with more than 38,000 members allow users to help each other use SugarCRM. The recently created SugarExchange allows people to sell and exchange products and services based on SugarCRM.

As Jacob put it, Open Source is "Passionware". People want to be involved in the tools they use every day and Open Source offers them new ways in which they can be involved. SugarCRM has been a leader at involving their users and building a community. The reward is that they have created a large number of passionate advocates of the product.

Sunday, March 04, 2007

The DRM Battle

The other day, someone on Slashdot commented that there had been a lot of posts on Digital Rights Management (DRM) recently. Well there have been and there will continue to be DRM posts because Intellectual Property Rights have become a huge issue caused by technology advances with computing and the Internet.

I have written about DRM frequently in the past and will continue to do so, as there is a lot to say. In the mean time, here is great quote that precisely expresses some of my feelings. It comes from Lifeline, the first published work by science fiction writer Robert Heinlein. It was written in 1939, some time before the RIAA even existed:

There has grown up in the minds of certain groups in this country the notion that because a man or a corporation has made a profit out of the public for a number of years, the government and the courts are charged with the duty of guaranteeing such profit in the future, even in the face of changing circumstances and contrary public interest. This strange doctrine is not supported by statute nor common law. Neither individuals nor corporations have any right to come into court and ask that the clock of history be stopped, or turned back.

Sunday, February 25, 2007

Two Disruptive Trends

Barry Klawans gave an excellent talk to the SDForum Business Intelligence SIG when he spoke on "Two Disruptive Trends, Open Source and SaaS Meet Business Intelligence" at the February meting. Open Source is something that I have written about before. SaaS stands for Software as a Service, the idea that information technology can be delivered as a service over the internet. The best example of a successful SaaS business is Salesforce.com. Barry knows the territory well as he is CTO of JasperSoft, an Open Source BI reporting company.

Barry started with the Innovators Dilemma, a book from the 90s that describes how established technologies and products markets can be overturned by innovators who use new disruptive technology or business models. Existing BI software vendors tend to target the high end of the market and they are vulnerable to disruption from new vendors that start by targeting the under served lower end. Barry believes that Open Source and SaaS are the forces that will overthrow the old guard of established BI vendors.

Next, Barry took us through the BI stack and Open Source projects that address it. Successful Open Source projects concentrate on doing one thing and doing it well. One of the problems with using Open Source is that you have to integrate several Open Source packages to build a system. Integration is made more difficult because active Open Source projects tend to have a very short release cycle. The most active project have a new release every 6 weeks or so. (This certainly struck a chord with me.)

This brought us to the second part of Barry's talk on Software as a Service (SaaS). The point of a SaaS system is to offload the user from the responsibility of building and maintaining an IT system. The job of building a SaaS system is integrating a lot software packages to provide the service. SaaS also has to deal with transparently upgrading the service to the users as it implements new features and fix bugs. As such it complements Open Source and it rapid development cycle.

SaaS is a newer market and there are only a few emerging SaaS BI services available now. Barry touched on three, LucidEra, SeaTab and Oco, all early stage start ups. There are some real architectural challenges to providing BI as a service. For example, one issue is security. For a number of reasons, many Open Source projects put security on the back burner. On the other hand, a Saas customer needs to have solid assurances that their data is secure and safe from other customers of the SaaS service.

We will have to see how these trends play out. Barry quoted a research report that suggested that software innovation goes in a roughly 15 year cycle, and that in Business Intelligence we are just entering a new cycle that can be expected to go on to 2020 or so.

Sunday, February 18, 2007

Like OMG

I overhear my daughter speaking on her new cellphone in the next room, and I get to thinking: OMG, does it mean "Object Management Group" or "Oh My God"? Well there is only one way to settle it: Google Fight. The results when they come in lean in the expected direction, but the results are not as overwhelming as anticipated.

So like any good data analyst I do a little exploration to check the results. I do not think that there is any problem with the Object Management Group side of the equation. On the other hand there are several potential spellings of Oh My God that could make a difference. A Quick test of "Oh Mi God" shows that it is not a problem and "Oh My God" is much more popular than the "O My God"spelling.

The next thing to test is whether the quotes are necessary. Removing the quotes quickly shows wobbly results, particularly with a big difference between O My God and Oh My God. Although curiously, the first page of O My God results from Google all show Oh My God as the search term. Google Fight shows Object Management Group winning over Oh my God but losing to O My God. My conclusion is that the quotes are necessary even although they make a less interesting Google Fight.

The other notable thing is that the numbers returned by Google Fight are not the same as the numbers returned by Google. Why? Well it is probably because they have a different setting of the advanced search options. You can spend hours over this stuff as my little example has shown. In the end the first Goggle Fight results seem to stand up, so perhaps the conclusion is that the world is more serious that I first though!

Wednesday, February 07, 2007

The Vista DRM Morass

After listening to the series of Security Now podcasts on Vista DRM (episodes 73,74,75 and 77), I got to thinking about the difference in position between Microsoft and Apple.

Microsoft is a software company whose software runs on what up to now has been a very open hardware platform provided by a vast array of vendors. Thus Microsoft have chosen and perhaps been forced to implement security features in their Vista operating system to allow protected High Definition content to be displayed safely on HD displays attached to the PC. When I say safely, it is not about protecting you from the content, it is about protecting the content from you and making sure that you do nothing unauthorized with content that you have paid good money for.

All this security comes at a price. It sucks up processing power. Complexity and the requirement for vigilance cuts into software reliability. The problems are evident. Vista had only just been released when the first Service Pack was announced. Moreover, while the Vista software itself is available, the truth is that the video drivers are not up to scratch and it will take some time before they work properly.

Apple, on the other hand, is a hardware and systems company. Their solution to protected video content is the Apple TV box. This is a little piece of hardware that connects to your TV and handles decoding and display of the protected High Definition content. An Apple computer does not need the complexity of the Vista protected video path because it is all wrapped up in a little box. The Apple TV box is not perfect. It is not quite here yet and from the specs it seems to lack codecs, however it does seem to be a better systems solution than Microsoft Vista.

Thursday, January 25, 2007

Visualization for All

If you are into playing with data these are good times. A number of web sites that have sprung up recently that allow you explore data visually. Data360 launched in October. Swivel got a mention from the influential TechCrunch blog. Many Eyes comes from IBM Research, although you may not think so from looking at the site.

Each of these web sites allows you to upload data sets and play with how they are presented, looking for insights into the data. Of the three, Many Eyes is the most approachable. Without having to register, you can play with data sets that others have uploaded. Many Eyes has a great collection visualization tools including scatterplots, stacked graphs and treemaps as well as the more mundane bar and pie charts.

For example, when I first visited Many Eyes, someone had uploaded a data set of restaurant reviews from the San Francisco Chronicle that scored the restaurants by food, atmosphere, service, price, noise and also gave an overall score. I looked at scatterplots and determined that there was no significant correlation between atmosphere and noise and that the only factor that seemed to show some correlation with the overall score is the score for food. I also used stacked graphs to explore US government spending over the last 45 years. The takeaway is that spending on health, particularly medicare and drugs accounts for the largest increase in spending.

The only problem with Many Eyes is that as befits a open research site, anyone can upload their data set, so by the time you read this the restaurant review data may have scrolled off to be replaced by other compelling data. Go and play with whatever data is there anyway. It will be a learning experience.

Thanks to Stephen Few and his great Visual Business Intelligence blog at PerceptualEdge for pointing out these sites.

Tuesday, January 23, 2007

DRM Wishes

The old saying goes "be careful what you wish for, lest it come true". The music industry wished for DRM to protect their content. They found their "white knight" in Steve Jobs who built the iTunes music store to deliver their content safely to iPod users everywhere. The problem is that the music industry now finds itself completely beholden to Apple as their only viable channel for digital music sales.

Apple controls the channel and dictates the terms for music sales, particularly the $.99 price which record executives want to vary. Also, the DRM is now seen to do more good for Apple then the music industry because it locks the music purchaser into Apple products. The more music bought, the more locked in the purchaser becomes. No wonder the music industry is now talking about selling music without DRM. Funnily enough Apple is against selling music on iTunes without DRM!

The only cloud on the horizon is that several European countries are trying to force Apple to open up their DRM for others to use. If these countries succeed, they take away the pressure on the music industry to sell music unencumbered. I view these countries efforts as totally misguided and I wish that they would just stop meddling.

On another front, the Jury is still out on whether the Microsoft Vista operating system is going to be so wrapped up in DRM that it is unusable. (I posted on this a couple of years ago.) There is a great discussion of Vista DRM on the Security Now podcast (episodes 73, 74 and 75).

Many people are surprised that Microsoft has yielded without a whimper to the content industry. If Microsoft had been willing to take a stand they could have negotiated a much better position for themselves and their products. It seems like Ballmer has been too willing to BOGU for the content providers. We will just have to stand back and see if he gets shafted.

Sunday, January 21, 2007

Complex Event Processing

Complex Event Processing (CEP) was the topic for the SDForum Business Intelligence SIG January meeting. Mark Tsimelzon, President, CTO and Founder of Coral8 spoke on "Drinking from a Fire Hose: the Why's and How's of Complex Event Processing".

Mark started out by showing us a long list of applications such as, RFID, financial securities, e-commerce, telecom and computer network security that share the same characteristics. Each of these applications can generate hundreds of thousands of event per second that need to be processed, filtered and have critical events identified and responded to in a millisecond or second timeframe.

The first response to building a system for one of these complex event processing applications is to load the data into a database and continuously run queries against the data. Unfortunately this introduces a number of delays that interfere with response time. Firstly there is the delay in loading the data into the database, as efficient database loading works best in batches. Next there is a delay in waiting for the query to be run as it is run periodically. Finally there is a delay caused by interference between the load process that is writing data and the query process that is trying to read the same data.

Given the problem of using a database, the next response to building a CEP system is to write a custom program in Java or C to do the job. This can be coded to meet the response time and data rate requirements, however it is inflexible. Any change to the requirements or data streams requires recoding and testing which take time and money. Coral8 and other vendors in the CEP space provide a system like a database that is programmable in a high level SQL-like language and that can process event streams at a rate similar to the hand coded system.

In a conventional database system, the data is at rest in the database and the queries act on the data. In a CEP system, the queries are static and the event data streams past the queries. When an event triggers a query, the query typically generate new event data. This structure allows event data processing to be parallelized by having several event processors that run different queries in parallel on the same data stream. Processing can be pipelined by having the output streams of one event processor feed into the inputs of another event processor.

It is important to understand that the purpose of a CEP system is not to store data. While events can linger, they eventually pass out of the system and are gone. A database complements a CEP system. For example, Coral8 can read data from database systems and even caches the data for improved efficiency. Also, output streams from Coral8 can, and usually are, fed into database systems.

If you want to try out CEP, visit the Coral8 web-site. There you can download documentation and a trial version of the software.

Sunday, January 14, 2007

Tableau Software

Business intelligence is about taking business data and turning it into actionable information, and there is a visualization problem at the heart of this process. Business data can be complicated and the user needs help in presenting the information in the best possible way. Unfortunately, many leading Business Intelligence tools seem to be deliberately designed to lead the user into making the worst possible presentation choices.

At previous meetings of the SDForum Business Intelligence SIG we have had great fun looking bad visualizations such as garishly colored 3-D pie charts and 3-D bar graphs that do more to obscure the information than to show it off. At the November meeting of the SIG we heard from a company that is doing something positive about data visualization when Kelly Wright, Director of Sales for Tableau Software, and a Bay area local, presented "Visual Analysis Using Tableau Software".

Tableau Software (www.tableausoftware.com) is a startup that emerged from a research project at Stanford University. There under the leadership of Dr. Pat Hanrahan a team of researchers worked on the difficult problem of enabling people to easily see and understand the information in their databases. As Kelly explained Tableau was formed in 2000 and took 5 years to develop their product, coming out with their first version in 2005. They are now on version 2.1.

Kelly gave us a whirlwind tour of Tableau's capabilities. Firstly Tableau is designed to understand the data that it is presenting, at least to the extent that it can make sensible choices about how to present the data in a useful way, for example, by giving line graphs of continuous data against time. While it is always possible to override the default, Tableau seems to do a good job with its choices. The next issue is being able to present large amounts of data and compare different aspects of the data against one another, and again the Tableau drag and drop interface seems intuitive and easy to use. When you can see all the data the next requirement is to drill down into the interesting data and remove the noise, and again Tableau has a set of tools for selecting the most interesting data points and looking into them further.

In retrospect it seems obvious to take the knowledge that has developed around how to present information, and package it into a data visualization product. However this is not as simple as it seems and the fact that Tableau took 5 years to develop their product shows the amount of work involved in doing this properly. Also, theirs is a lonely path. The other BI vendors prefer to provide flash and features over carefully integrated substance.

TableauÂ’s product is not expensive for a data-head, and if you ask, you can get a 10 day free trial to find out exactly what it can do. Go ahead and try it!