Monday, May 28, 2007

Distributed Analytics

Ever since the meeting, I have been trying to get my mind around the presentation by John Mark Agosta to the SDForum Business Intelligence SIG on "Distributed Bayesian Worm Detection". The topic was about how a network of computers working together could be more effective at detecting a computer worm by exchanging information, described as gossiping amongst their neighbors. It is part of Intel's research on Autonomic Enterprise Security.

The idea is that after a worm has taken over one system in a network, it will try to take over other systems in that network by sending out messages to those systems. Basic worm detection is done by detecting unusual activity in the outgoing messages from a host system. In a distributed worm detection system, each host in the network making a decision as to whether it is infected by a worm and then exchanges that information with other systems in the network. Overall the network determines whether it has been attacked by a worm through a distributed algorithm.

The most interesting claim that John made was that the detectors on each host could be set to have a low threshold. That is they could be set to report a lot of false positives about being infected by a worm and yet in the distributed algorithm these would cancel out so that overall the system would only report a worm attack when one was actually under way.

I have considerable experience with distributed algorithms, and one thing that I know is that a distributed algorithm can be viewed as a sequential algorithm that is just being executed in parallel. Thus, the worm detection system can be viewed as a collection of noisy detectors that are then filtered by the next level of the algorithm to give a reasonable result in the aggregate. As such this could be a useful analytic technique. On the other hand, a worm attack is something that can only happen in a network, so perhaps the technique is only applicable to distributed systems.

Like a good proportion of the audience, I was interested in whether the analytic techniques described could have a broader application in other areas of Business Intelligence like fraud detection. So far I have not managed to come up with any convincing applications. Any suggestions?

No comments: