Monday, August 22, 2005

Data Mining Insight

Data mining is a difficult subject. On the one hand it is presented as this thing that will tell you all sorts of wonderful facts that you never knew about your data. On the other hand when you start getting into it, it is this daunting thing that is difficult to approach, seems to require a PhD in statistics to use and ends up telling you stuff that you already know, like when people buy bread at the market they are also likely to buy milk (or was that the other way round?)

At the August meeting of the SDForum Business Intelligence SIG Joerg Rathenberg, VP Marketing and Communications at KXEN gave a talk "Shaping the Future" about predictive analytics, which is the latest way of saying data mining. KXEN is a young, privately held company that is devoted data mining and that is successful, while many other data mining startups have fallen by the wayside.

The kernel of KXEN's success comes from powerful robust algorithms that do not require a specialist to tweak, high performance so that you can get results quickly and finally and most importantly, ease of use. As part of his presentation Joerg ran through a couple of data mining exercises showing us how you could take a reasonable sized data set in say the form of comma separated values (CSV), and using a few clicks and a several seconds processing, generate an interesting data analysis.

For me the key insight of the evening was on how to use data mining. I had always thought of data mining as a tool of last resort, when the data is too large or complicated, and nothing else seems to work, you resort to data mining to try and find something that you cannot see with the naked eye. On the other hand Joerg suggested that data mining is the first thing that you do when presented with a new business question, or a new data set. You use data mining to for the initial analysis of the data to find out which factors in the data really affect the outcome that you are interested in. Once these factors are identified, you can build reports or OLAP cubes using these factors as dimensions to explore in depth what is going on.

Thus data mining is something that you should be doing early and often in your data exploration. Joerg called this "Exploratory Data Mining" and it certainly resonated with audience members who do data analysis for a living. KXEN has designed their software to make exploratory data mining possible and even easy, and hope that by this means it becomes accessible to the masses.

No comments: