Wednesday, December 05, 2007

Aggregate Data, Make Money

In the old days, the aggregate business was about selling large numbers of small stones for building roads and such, now the aggregate business is about collecting large numbers of data points and aggregating them into information valuable enough to support a business. We heard about one such business at the November/December Meeting of the SDForum Business Intelligence SIG where Faisal Mushtaq, CTO, Biz 360 spoke on "Do you know what customers are saying about you and your products?".

Biz360 starts as a Web version of a clippings service. They gather media articles, blog entries and other documents that reference their clients from the web and other information services. As the information is in digital format, there are opportunities for interesting analyzes that cannot be done with dead tree clippings. For example, Biz360 automatically ascribes a sentiment, negative or various grades of positive to each document so that the business can track what people think about it. They also ascribe a reach to each document that measures its importance. This captures the fact that an article in the Wall Street Journal will be seen by and influence more people than an article in say the San Jose Mercury News.

Once the data is gathered, a client can look at it in many different ways. The starting point is usually dashboard that shows reach and sentiment over a period of time. The client can look at changes in reach and sentiment and drill down to find out what lies behind them. Newspaper articles cite sources and blogs are linked so the software can derive and show the network of influence between related items.

Faisal showed us a specific example of how attention to sentiment could have helped avert a PR problem. In 2006, Dell had a problem with exploding batteries in their laptop computers. Although the story did not break in the newspapers until August, there had been a large discussion of the problem in the blogosphere for months beforehand. Faisal showed us a convincing graph showing that by the time the mainstream news media picked up on the problem, it was already old hat in the blogosphere.

Doing this kind of analysis requires some serious computing power. Biz360 has a 150 CPU processor farm with 25 Terabytes of disk storage arranged as a computing GRID. They download about 1 million media articles a day and 4 million blog posts. From this they derive roughly 200 thousand articles of interest to their clients and apply 7 million automated analyses to these articles.

No comments: