Wednesday, May 30, 2007

FSJ Rules

Not sure quite why, but "The Secret Diary of Steve Jobs" is my favorite read on the internet. Every day there is at least one entry that makes me laugh out loud. Today, "Goatberg" asked RSJ (Real Steve Jobs) whether he reads FSJ (Fake Steve Jobs). RSJ replied that he had read a few of the FSJ things recently and thought them pretty funny.

Currently Vallywag has a hot campaign to unmask the identity of FSJ. In the past, others have tried and failed. Given Vallywag's record for bulldog journalism I do not expect them to get anywhere either. Their latest candidate misses the mark on several fronts. From reading between the lines, I think that that FSJ is of Steve's generation and has a strong British expat connection as several posts contain cultural references to Private Eye, the British satirical magazine.

Anyhoo, if you want a good laugh, hear Bono on how he is going to get the World Bank job.

Monday, May 28, 2007

Distributed Analytics

Ever since the meeting, I have been trying to get my mind around the presentation by John Mark Agosta to the SDForum Business Intelligence SIG on "Distributed Bayesian Worm Detection". The topic was about how a network of computers working together could be more effective at detecting a computer worm by exchanging information, described as gossiping amongst their neighbors. It is part of Intel's research on Autonomic Enterprise Security.

The idea is that after a worm has taken over one system in a network, it will try to take over other systems in that network by sending out messages to those systems. Basic worm detection is done by detecting unusual activity in the outgoing messages from a host system. In a distributed worm detection system, each host in the network making a decision as to whether it is infected by a worm and then exchanges that information with other systems in the network. Overall the network determines whether it has been attacked by a worm through a distributed algorithm.

The most interesting claim that John made was that the detectors on each host could be set to have a low threshold. That is they could be set to report a lot of false positives about being infected by a worm and yet in the distributed algorithm these would cancel out so that overall the system would only report a worm attack when one was actually under way.

I have considerable experience with distributed algorithms, and one thing that I know is that a distributed algorithm can be viewed as a sequential algorithm that is just being executed in parallel. Thus, the worm detection system can be viewed as a collection of noisy detectors that are then filtered by the next level of the algorithm to give a reasonable result in the aggregate. As such this could be a useful analytic technique. On the other hand, a worm attack is something that can only happen in a network, so perhaps the technique is only applicable to distributed systems.

Like a good proportion of the audience, I was interested in whether the analytic techniques described could have a broader application in other areas of Business Intelligence like fraud detection. So far I have not managed to come up with any convincing applications. Any suggestions?

Sunday, May 13, 2007

People Search Redux

Reading through my last post on People Search, I realize that I did not quite join up the dots. Here is how people search works. It is something I have written about before, and the Search SIG meeting did reveal some new angles.

Firstly, the search engine spiders the web collecting people related information. Next comes the difficult part, arranging the information into profiles where there is one profile per person. This is most difficult for common names like Richard Taylor. Then there are other little variations like nickname, for example, Dick, Rich, Ricky for Richard, spelling variations (Shakespear) and middle names or initials that may or may not be present.

A good profile linked to an identified user is a valuable thing. For example, it can be used to direct advertising to the desired demographic, making the advertising more valuable. As I have noted before this kind of information is most valuable to large internet companies like Yahoo, Google who effectively direct a large part of online advertising.

A profile is much more valuable if the person has taken control of their profile and effectively verified it. So the final step for the people search companies is to create enough awareness that people feel compelled to take control of their own profile. I have run across ZoomInfo profiles that have been verified, so they have started to do this for their specialized audience. Wink and Spock will have to try much harder. I looked at my Wink profile and I was not impressed. I have seen a scarily accurate profile of myself online, and Wink did not come close.

At the meeting, DJ Cline opined that people might be willing to pay money to have their profiles taken down. The panel of search company CEOs disagreed that this was a good model and told us that they would try to talk someone out of demanding that their profile is removed. I think that what this means is that a good profile is actually of more value to others than it is to the target of the profile. Quite apart from that is the thought that blackmail is a "difficult" business model.

Michael Arrington several times expressed the opinion that that Spock would be sued for what they were doing, particularly as one of the example profiles they showed was Bill Clinton with the tag "sex-scandal". I was concerned with a possibility that a profile could be hijacked, as the hijacker could then play tricks to embarrass the target of the profile. A high profile lawsuit or profile hijacking with a lot of attendant publicity could be the catalyst that brings people search to the public attention given what people search gains from an event that makes everyone go out and claim their profile.

However this gets us back to blackmail as a business model. It is one thing for Joe Average to create his own MySpace page. It is quite another thing if Joe Average feels that he has to go out and claim a profile page that someone else has put together without even asking him, just so that he can defend his own good name. People search has had a long history of privacy concerns and it will continue to do so.

Tuesday, May 08, 2007

Making Money from People Search

The first question that moderator Michael Arrington asked was how are you going to make money? This was at tonight's SDForum Search SIG meeting on "People Search", and he was asking the panel consisting of Michael Tanne CEO of Wink, Jaideep Singh CEO of Spock, and Bryan Burdick COO of ZoomInfo.

ZoomInfo, which has been around for a few years and is aimed at corporate and executive search claimed to be profitable, and generating cash through subscription revenue. Wink and Spock are aimed at the bigger and more general consumer markets and need to be supported to some extent by advertising. These two companies are not at the money making stage yet, Wink is live while Spock is in invitation-only beta. Also, they are not going to generate a lot of money from direct search advertising judging by the numbers that were bandied around.

I think that a general people search function is most useful as an adjunct to a large internet company like Google or Yahoo that generate much of their revenue from advertising. People search is used to get demographic information about the user which is then used to target the advertisement. At a Business Intelligence SIG meeting last summer we heard about how Yahoo is targeting banner ads based on demographics, and it is no coincidence that Yahoo already has its own people search feature.

The CEOs of Wink and Spock both protested that they were pure people search companies that could survive and grow as pure people search businesses. However there was a general murmur in the room that everyone has their price. We will just have to see how it plays out.

Wednesday, May 02, 2007

Save Time, Generate Code

I mentioned the after party in the previous post, now is time to talk about the SDForum SAM SIG meeting "Writing Code Generators For Quality, Productivity, and Fun" by Bill Venners of Artima. In short, Bill's talk was about designing domain specific languages and using code generation to solve programming problems. Code generation can build better, more reliable code faster, particularly in situations where you find yourself doing a lot of cut and paste style .

As befits the technique, Bill got the audience to do a lot of the lifting by describing how they had solved various programming problems through code generation. I have used it a couple of times as well as some real language design, and I shared one of my experiences along with several other audience members. It was definitely encouraging to hear that many audience members had successfully used code generation.

Bill characterized a number of situations where code generation is useful and showed us a specific example of a domain specific language that he had designed. From my experience language design is comparable to API design except that there are more degrees of freedom so there are more ways to go wrong. Designing a real programming language is hard. There are issues at the lexical level, problems with keywords and extensibility, the need to make the language regular and unsurprising while dealing with special case and a thousand other little details to get right.

On the other hand, if a domain specific language can be kept simple enough, it can skirt these problems. Bill's example language was simple. Each statement consisted of a set of keywords followed by a name. In my opinion, domain specific languages are a great technique provided that you firstly design a clean simple language and secondly remember the principals of API design.