Build and Break

Monday, July 23, 2007

BI Software as a Service

Like many others, I went to the July meeting of the SDForum Business Intelligence SIG expecting to hear about Salesforce.com and its new AppExchange. In practice we heard about a lot more. The speaker, Darren Cunningham, had just left Salesforce.com and joined LucidEra, a new start up that specializes in Business Intelligence as Software as a Service (SaaS). Darren also brought along Ken Rudin, the CEO of LucidEra who had some interesting things of his own to say. There are several write ups of the meeting on the web that capture the big picture. I want to note a couple of the smaller moments from the meeting.

Firstly, both Ken and Darren described building your own business intelligence system as being similar to a business building and running its own Nuclear Reactor. What they mean is that while a full blown data warehouse can be very powerful, it also requires a lot of maintenance and attention to detail to keep it running. In practice many businesses do not have the resources and energy to keep their business intelligence system running in the kind of peak condition that is required to get value from it.

Secondly, someone asked about LucidEra's pricing and business model. Darren explained that they are taking the same approach as Salesforce.com. That is, start with an offering for small and medium businesses and then build up to the larger businesses. After 8 years in business Salesforce.com is just now starting to sign up large businesses like major brokerage houses.

Ken had some interesting things to say about pricing. Unlike sales, a typical business has a few business analysts who would be the major users of their system, so pricing on a per seat model is difficult. Also they do not want a tax on reports, so the pricing model should allow a business to run as many reports as they want. Therefore LudcidEra uses a flat rate per application per customer. Currently they are using a single flat rate of $2900 for their starter application.

In my opinion, a flat rate encourages the customer to use as much of the service as they can which is a good thing, however it is optimal for a particular size of business. Smaller customers are shut out by a cost that is too high for them. Large customers could overwhelm the service without fully covering their costs. One way to expand the model is to provide tiered pricing based on the size of the customer. Tiers could be based on either customer total revenue or number of employees.

At the moment LucidEra is just starting to build their portfolio of customers and applications. They will have to suffer growing pains before their pricing model is put to its full test.

Wednesday, July 04, 2007

Design Patterns in Python

Alex Martelli is a leading light of the Python programming language community. He is a leader in the development of the language, author of Python in a Nutshell and has written extensively on Python in other books and articles. Last week he spoke to the SDForum Software Architecture and Modeling SIG on "Design Patterns in Python". Alex has posted the presentation slides here.

Design patterns are a useful concept in programming. A programming language consists of a set of basic constructs that are used in various ways to build a program. Design patterns capture higher level constructs that commonly appear in programs. As Alex explained, an important part of each design pattern is its name. Having a good and well known name for each pattern makes it easy to explain a program and discuss how it works at a higher level than the programming language constructs.

Although design patterns are supposed to be programming language agnostic, many of the books on design patterns are based on statically typed languages such as C++ and Java. There are some classic design patterns that do not translate well into dynamically types languages like Python. One example is the Singleton design pattern. This uses strong typing to ensures that there can be only one instance of a particular type of object.

Python is a dynamically typed language, and in these languages the type system is used to make sure that the right thing happens at run time rather than to do any enforcement. The consequence is that you cannot really do a singleton in Python or any other dynamically typed language. If I were in the dynamic language camp, I would argue that if you only want one instance of an object, make sure that you only create one instance of the object. This is clearly in the spirit of dynamic typing.

However Alex is clearly concerned by the absence of these design patterns from Python. First he claimed that these design patterns do not exist in Python because it is a higher level language and that the design patterns are somehow subsumed by the language system. Then he went on to show us his Python answer to the singleton design pattern, which he called the Borg. In the Borg, all instances of the class share the same state, and so they all appear to be the same instance.

While the Borg pattern works in Python, I do not see it a either necessary or interesting. Alex told us, Guido van Rossum, the father of Python, does not like it either.

Thursday, June 28, 2007

Relational Messaging

The SDForum Business Intelligence SIG was honored to have a new company launch at the meeting last week. Julian Hyde, well known for leading Mondrian, the Open Source OLAP project, gave a talk titled "A new solution to an old problem: How Relational Messaging solves Data and Application Integration". He told us that he had been working with a small group for the last 3 years to develop their streaming database product in a new company called SQLstream (no web site yet).

Streaming databases are nothing new to the Business Intelligence SIG. In 2004 we hosted Celequest, which was recently bought by Cognos. In 2005 we hosted iSpheres, which has subsequently disbanded. In January 2007 we hosted Coral8. The difference between SQLstream and these other companies is less one of technology and more one of vision.

Although there was interesting technology buried inside, Celequest presented their system as a way of creating real time dashboards. Unlike Celequest, iSpheres was concerned with performance, although they were weighed down by a propriety language. Coral8 is concerned with handling complexity and being able to scale when handling real time events that happen in the millisecond to second range.

SQLstream, on the other hand sees their streaming database as a system that can do be used to do any and all Enterprise data and application integration functions. The bulk of the presentation was about how SQLstream could supplant ETL, EAI, EII, SOA and some uses of database systems amongst others. Anyone who has any experience in this area will know that this is a huge claim. I am not going to say whether they are right or wrong, only time will tell us that, however I am going to admire the vision.

There is one other admirable thing in SQLstream and that is a commitment to the SQL standard. Streaming databases requires some some extensions to SQL. SQLstream has some extensions to the SQL standard however they are few and necessary. Apart from these extensions their language is pure SQL.

(The presentation is currently available as a file on our Yahoo web site.)

UPDATE: 12/11/07 SQLStream now have their web site up.

Sunday, June 24, 2007

Virtual Goods

I did not have the time to attend the recent conference on Virtual Goods at Stanford, however the write ups did pique my interest and raise a couple of questions.

Firstly, where do Virtual Goods end? For example, is a downloaded music file a virtual good? I can see that there is a distinction between an MP3 file I have downloaded to my home system and a virtual good that only exists on a server for a virtual world, for example land in Second Life or gold in World of Warcraft. However, there are DRM schemes for music where I cannot use a music file that I have downloaded unless my system is in contact or has recently contacted the server through the internet.

It is fine to define virtual goods as digital goods that are under the control of some other entity, but we need a definition. (I put in the word digital because there are slightly more tangible goods that I own but that are kept under the control of others. A good example is in the stock market, where when I buy shares in a company, the share certificate is held by my broker.)

Secondly, how should we value Virtual Goods? As Susan Wu says in her introduction to the conference, in general we can value virtual goods like any other goods based on their utility. However there are some special considerations. A base value for any goods is its marginal cost of production. The marginal cost of producing virtual goods is zero, so virtual goods can become worthless. For example, if I spent dollars buying gold in the World of Warcraft and Blizzard Entertainment ceases business, my gold has zero value.

Now I do not expect Blizzard to go out of business any time soon. Anyway, if I did buy gold, I would not spend more that I would on a meal in an expensive restaurant. Also, I would immediately spend the gold on something useful like a Vorpal Sword of Bunny-Smashing and go out and kill some bunnies or Orcs or whatever to get value from the money I had spent.

A much more interesting case is presented by Second Life. The company is a private startup without the transparency of Blizzard Entertainment and the controversial business model is less proven. The things that you buy are virtual land and virtual adornments, things that you might expect to keep and cherish in the real world, yet which have a more fleeting existence in the virtual world. While one person has made a huge business success out of Second Life, others have tested it and found it wanting.

We will just have to wait and see what happens to Second Life. In the mean time, I put virtual goods in the same category as fancy food, flowers and 'gifts'. That is something to consume and savor as they are being consumed.

Thursday, June 14, 2007

Programming Language Wars

I have been doing a lot of programming recently, using a wide variety of programming languages. In the last year I have written programs or modules in the following programming languages: C, C++, Java, Perl, PHP, Python, Shell and various dialects of SQL. Tiobe Software maintains an interesting Community Index that tracks programming language popularity. From their data I calculate that I am using 65% of current programming languages (by volume).

For anyone new to the game, Programming Language Wars have a long and storied history. As soon as the first programming was implemented, there were wars about whether to use it or assembler. Shortly thereafter there were two programming languages and the battle commenced with earnest. For example, this article remembers some of the more ridiculous wars from the 70's and 80's. Newbies, get over it, we are at least on the 423 "Great Programming Language War".

I am going to write a series of posts on programming languages that I hope will provide a more balanced view that the usual blast.

Saturday, June 09, 2007

Video 2.0

The best way to find out about something new is to try and do it. Robert X. Cringely illustrated this in an unusual post this week. The post is unusual because he normally comments in his acerbic way on what others in the tech industry are doing. This post is on his own technology adventure.

Cringely has put together a series of video tech interviews and he wanted to provide a way for video feedback. I have done some video, but until I read read the post, I had never considered the possibility of video as a response. Cringely provides an example in the comments of a You Tube video that absolutely demands video feedback and gets it. (That Pachelbel sure knew how to write a bass line, even in the 17th century.)

In other video sightings from around the web, the normally staid TechCrunch blog has a wonderful video mashup on the Jesus phone.

Wednesday, May 30, 2007

FSJ Rules

Not sure quite why, but "The Secret Diary of Steve Jobs" is my favorite read on the internet. Every day there is at least one entry that makes me laugh out loud. Today, "Goatberg" asked RSJ (Real Steve Jobs) whether he reads FSJ (Fake Steve Jobs). RSJ replied that he had read a few of the FSJ things recently and thought them pretty funny.

Currently Vallywag has a hot campaign to unmask the identity of FSJ. In the past, others have tried and failed. Given Vallywag's record for bulldog journalism I do not expect them to get anywhere either. Their latest candidate misses the mark on several fronts. From reading between the lines, I think that that FSJ is of Steve's generation and has a strong British expat connection as several posts contain cultural references to Private Eye, the British satirical magazine.

Anyhoo, if you want a good laugh, hear Bono on how he is going to get the World Bank job.

Monday, May 28, 2007

Distributed Analytics

Ever since the meeting, I have been trying to get my mind around the presentation by John Mark Agosta to the SDForum Business Intelligence SIG on "Distributed Bayesian Worm Detection". The topic was about how a network of computers working together could be more effective at detecting a computer worm by exchanging information, described as gossiping amongst their neighbors. It is part of Intel's research on Autonomic Enterprise Security.

The idea is that after a worm has taken over one system in a network, it will try to take over other systems in that network by sending out messages to those systems. Basic worm detection is done by detecting unusual activity in the outgoing messages from a host system. In a distributed worm detection system, each host in the network making a decision as to whether it is infected by a worm and then exchanges that information with other systems in the network. Overall the network determines whether it has been attacked by a worm through a distributed algorithm.

The most interesting claim that John made was that the detectors on each host could be set to have a low threshold. That is they could be set to report a lot of false positives about being infected by a worm and yet in the distributed algorithm these would cancel out so that overall the system would only report a worm attack when one was actually under way.

I have considerable experience with distributed algorithms, and one thing that I know is that a distributed algorithm can be viewed as a sequential algorithm that is just being executed in parallel. Thus, the worm detection system can be viewed as a collection of noisy detectors that are then filtered by the next level of the algorithm to give a reasonable result in the aggregate. As such this could be a useful analytic technique. On the other hand, a worm attack is something that can only happen in a network, so perhaps the technique is only applicable to distributed systems.

Like a good proportion of the audience, I was interested in whether the analytic techniques described could have a broader application in other areas of Business Intelligence like fraud detection. So far I have not managed to come up with any convincing applications. Any suggestions?

Sunday, May 13, 2007

People Search Redux

Reading through my last post on People Search, I realize that I did not quite join up the dots. Here is how people search works. It is something I have written about before, and the Search SIG meeting did reveal some new angles.

Firstly, the search engine spiders the web collecting people related information. Next comes the difficult part, arranging the information into profiles where there is one profile per person. This is most difficult for common names like Richard Taylor. Then there are other little variations like nickname, for example, Dick, Rich, Ricky for Richard, spelling variations (Shakespear) and middle names or initials that may or may not be present.

A good profile linked to an identified user is a valuable thing. For example, it can be used to direct advertising to the desired demographic, making the advertising more valuable. As I have noted before this kind of information is most valuable to large internet companies like Yahoo, Google who effectively direct a large part of online advertising.

A profile is much more valuable if the person has taken control of their profile and effectively verified it. So the final step for the people search companies is to create enough awareness that people feel compelled to take control of their own profile. I have run across ZoomInfo profiles that have been verified, so they have started to do this for their specialized audience. Wink and Spock will have to try much harder. I looked at my Wink profile and I was not impressed. I have seen a scarily accurate profile of myself online, and Wink did not come close.

At the meeting, DJ Cline opined that people might be willing to pay money to have their profiles taken down. The panel of search company CEOs disagreed that this was a good model and told us that they would try to talk someone out of demanding that their profile is removed. I think that what this means is that a good profile is actually of more value to others than it is to the target of the profile. Quite apart from that is the thought that blackmail is a "difficult" business model.

Michael Arrington several times expressed the opinion that that Spock would be sued for what they were doing, particularly as one of the example profiles they showed was Bill Clinton with the tag "sex-scandal". I was concerned with a possibility that a profile could be hijacked, as the hijacker could then play tricks to embarrass the target of the profile. A high profile lawsuit or profile hijacking with a lot of attendant publicity could be the catalyst that brings people search to the public attention given what people search gains from an event that makes everyone go out and claim their profile.

However this gets us back to blackmail as a business model. It is one thing for Joe Average to create his own MySpace page. It is quite another thing if Joe Average feels that he has to go out and claim a profile page that someone else has put together without even asking him, just so that he can defend his own good name. People search has had a long history of privacy concerns and it will continue to do so.

Tuesday, May 08, 2007

Making Money from People Search

The first question that moderator Michael Arrington asked was how are you going to make money? This was at tonight's SDForum Search SIG meeting on "People Search", and he was asking the panel consisting of Michael Tanne CEO of Wink, Jaideep Singh CEO of Spock, and Bryan Burdick COO of ZoomInfo.

ZoomInfo, which has been around for a few years and is aimed at corporate and executive search claimed to be profitable, and generating cash through subscription revenue. Wink and Spock are aimed at the bigger and more general consumer markets and need to be supported to some extent by advertising. These two companies are not at the money making stage yet, Wink is live while Spock is in invitation-only beta. Also, they are not going to generate a lot of money from direct search advertising judging by the numbers that were bandied around.

I think that a general people search function is most useful as an adjunct to a large internet company like Google or Yahoo that generate much of their revenue from advertising. People search is used to get demographic information about the user which is then used to target the advertisement. At a Business Intelligence SIG meeting last summer we heard about how Yahoo is targeting banner ads based on demographics, and it is no coincidence that Yahoo already has its own people search feature.

The CEOs of Wink and Spock both protested that they were pure people search companies that could survive and grow as pure people search businesses. However there was a general murmur in the room that everyone has their price. We will just have to see how it plays out.