Build and Break: 2005

Saturday, December 24, 2005

iPod Fever

I got an iPod Shuffle for listening to podcasts a couple of weeks ago, and despite some initial problems, I am very pleased with it. There are still a few rough edges with iTunes. It may be fine for someone who wants to collect and listen to music, but it does not work quite so well for podcasts.

A big rap against the iPod is that it does not have a radio. I got an iPod because I wanted to listen to podcasts while I worked out, and as radio reception is broken at the gym, I was not looking for a radio anyway. With a little experience I have come to realize that a radio in an iPod would be completely superfluous.

The point of the iPod and a podcast is that I can listen to the talk show I want to listen to, when I want to, rather than being tied to the tyranny of the radio schedule. Initially I thought that I would miss the serendipity of having to listen to the available talk radio, however there is plenty of serendipity when you subscribe to the right podcasts.

Thursday, December 15, 2005

Software Development and Content Management

Recently, after I wrote about Content Management for consumers, I realized that the software that I use every day in my job as a Software Developer is also a Content Management system. In bald terms, a Content Management system consists of a content repository and workflow that define and controls how the content is managed.

A software system comes from a collection of source code that is maintained in a Software Configuration Management system (SCM). It allows you to see each change to the code and who made it. A sophisticated SCM allow for code development along different branches at the same time. A SCM is in effect a content repository.

Some people on reading this will say that a Content management system means that all content is kept in a database. I will explain on another occasion why it a bad idea to lock content away in a database and propose a better solution for a content repository.

In the system I use we have a bug tracking system that is called an issues system because issues is good newspeak for bugs, and also because the system is used for tracking not only bugs, but also features and enhancement suggestions. Moreover the system is set up so that you cannot do a code checkin without tying the code change to an issue. This allows the system to check that the code change is made to an appropriate version of the code.

The point of all this is that the bug tracking system defines and controls the code development workflow and is tightly tied to the content repository. Thus the whole thing is a Content Management system.

Tuesday, December 13, 2005

Ruby on Rails

Slashdot must be getting desperate. Yesterday they had a lengthy discussion on the Future of Emacs, and today they had yet another discussion as to whether Java is so 90s. Meanwhile, tonight at the SDForum Emerging Tech SIG, we heard Tom Hill (founder of the SDForum Java SIG) talk about Ruby on Rails, the hottest new software technology on the block.

Ruby is another of those object oriented scripting language that is apparently typeless but seems to always do what is expected without overspecification. Rails is a web application framework written in Ruby that is also designed to do just the right thing. Together they make a terrific base for web based applications. All you need is Ruby on Rails, a database, a web server and a little development time and you are in business.

Tom touched on many interesting topics during his presentation. The one that struck me is the treatment of persistence in Ruby compared to say Hibernate. Hibernate is a Java framework that I mentioned in my last post for persisting Java objects in a database. With Hibernate you specify the object in Java and also specify the database schema and Hibernate does the mapping and transference between these two representations.

In Ruby, the ActiveRecord component gets the object specification from the database schema and creates an object based on the database record structure. There is only one specification of the object structure, so there is no problem with synchronization of specifications. Moreover, the Ruby object automatically adjusts to changes in the database schema without necessarily needing changes to the program.

In practice, a database needs its schema otherwise it would not know how to access and deliver its data. In this light, having the application language bend to use the database schema for its object structure seems sensible, and Ruby makes it seamless.

Tuesday, December 06, 2005

POJO

As a young newly minted software person, one of the first things I remember hearing was an engaging a talk on the issue of separating business logic from implementation. That was mumblety-mumble years ago. How times do not change. Tonight at the SDForum Java SIG we heard a talk on exactly the same subject.

This time the talk was wrapped around the subject of POJO, which stands for "Plain Old Java Objects". Chris Richardson, author of a new book "POJOs In Action" spent most of the talk telling us what POJO are not. In particular POJOs are not Enterprise Java Beans (EJB).

Underlying all the negativity was a positive message. Write your business logic as a set of plain old Java objects. Then implement the application through lightweight frameworks. Use Hibernate or JDO to make the objects persistent and Spring to provide transactions.

The important concept is that you do not change the business logic code. These frameworks are declarative specifications in XML that describe the implementation of the business logic without you having to change it. I know from my experience with JDO that it is not quite as easy as all that, however it seems like we are still headed in the right direction if not making a lot of progress.

Saturday, November 26, 2005

The Future of Music

Some time ago I wrote the following in a piece on intellectual property in the digital age:

"... record company that on one hand pays radio stations to BROADCAST its hit song, while at the same time complaining that it is losing money because people are sharing the song on their computers"

Today I was looking at internet radio software that receives a broadcast stream from an internet radio station, and converts each song into a MP3 complete with ID3 tags. The point is that to build an MP3 collection, you do not need to illegally download music, all you need to do is capture the broadcast stream.

In practice this turns the economics of music on its head. In the past, performers made most of their money from recordings and used live performances to promote the recordings. The emerging business model is that performers give away their music to build an audience and then make their money from live performances.

In practice, this model has a lot to recommend it. As the audience grows older, they tend to become financially better off and willing to pay more for live performances. Properly managed, a performer can build a lifelong career that becomes more and more rewarding. Not surprisingly, this business model is both well proven and emerged in Silicon Valley.

Sunday, November 20, 2005

Consumer Content Management

Could there really be a huge and important consumer software category that is not being properly addressed? I got to thinking this when I happened across a piece that proposed the emergence of a new type of software, the media file "Type Manager". The concept is that each type of media like pictures and music needs its own file manager that knows about that type of media.

I could write a long harangue about how awful these programs are and how far I will to go to avoid them. For example, one major peeve is that they deliberately hide the location of the file so that when I want to edit it or use it in another application, I have to resort to some trick to find out where it is located so that I can open the file in that application.

However a long harangue would just obscure a much more important point. There already exists a category of software that addresses all the issues raised by the media file type manager and does a lot more. It is called Content Management.

As a software category, Content Management has gone through the usual evolution. The original applications were high-end one off applications created for demanding users like CNN and the BBC that have enormous content management problems. At the same time a mid-range thread emerged from web media organizations like Salon who developed their own software to manage their production (and which has now gone Open Source). Now Content Management is expanding into general use as all organizations discover that they have digital content to manage. The final step is for Content Management to become a consumer product.

Content Management has three major components:

Version management. Every time you edit a media file, for example, changing the level of an MP3 or cropping a digital image, you create a new version of the file. Version management keeps track of all the versions of a piece of media and how they are related.
Metadata management. All types of media files contain metadata, sometimes known as tags. However, there is always the need for more metadata, some of which, like version information, does not really belong in the file. Better to extract all the metadata and put it in a database where it can be searched, collated and aggregated.
Workflow. This is the major part of professional Content Management and in some sense its reason for being. In a consumer application, a single person does all the tasks so there is less need for workflow, however it can still be useful for automating repetitive tasks.

The most important feature of Content Management is that it handles all media types. I do not want a separate application for each media type, each with its own user interface and set of annoying little quirks. Also, it is useful to relate media, such as keeping documents of lyrics with songs, keeping track of the music used in video and slide shows, connecting pictures, text and perhaps music in web posts.

There is a lot more to say about metadata, version management and the structure of a universal content manager. We will have to get to them at another time. For the mean time, are you ready for Consumer Content Management? I know that I am.

Has Sony-BMG been Caught Stealing Software?

The recording companies lecture us on how we should not steal Intellectual Property. Through their industry association, they prosecute our children. Now one of them is being accused of stealing software content. If they want us to respect their property rights, they need to remember that they are not above the law themselves.

Wednesday, November 16, 2005

Predictive Analytics Redux

Anyone who did not come the SDForum Business Intelligence SIG meeting on Tuesday missed a great talk. Eric Zankman led us through a case study of a customer analytics engagement with a large telecom company that addresses a specific business issue and eventually provided a measurable multi-million dollar return.

During the engagement he had: built a data mart to collect the data, built a set of predictive models, segmented the customers, developed a set of strategies for handling the business problem, run a series of tests with different strategies to understand the costs and benefits of each strategy and finally set things up so that the customer could continue to monitor and refine their strategy.

Eric described each stage with enough clarity that I feel that I could reproduce his work if I were asked to. Of course I would not do it as well as Eric did, but that is not the point. I have read books, been to classes, heard presentations on customer analytics and never seen such a simple yet comprehensive walk through of what to do and how to do it.

So if you did not come, you missed a great meeting. Sign up with our mailing list/group so that you do not miss another meeting.

Saturday, November 12, 2005

Master Data Management

There is a new term in enterprise software - Master Data Management. While the term is new, the concept it not quite so new. I view it as the end point of a change that has been going on for some time.

The concept of an Enterprise Data Warehouse emerged during the 1990's. One of the compelling reasons for creating an Data Warehouse is to create a single version of the truth. An enterprise has many different IT systems, each with its own database. The problem is that each database has its own version of the truth.

So for example, consider a typical enterprise that has many customers. It will have marketing databases and several sales databases, each associated with the IT system for that sales channel. There are also service systems with their associated databases. The same customer may appear in several of these databases that support the business operations and in each one the customer information is different. There are variations in the name, different addresses and phone numbers or in some cases no contact information at all. While this example is about customers all other enterprise information also exists in many databases, and anywhere that information is multiplied, there are bound to be contradictions, failures and gaps.

Rather than try to resolve this mess, the idea of the Enterprise Data Warehouse is to create a new database that contains the clean and corrected version of the data from all the operational databases. So the Enterprise Data Warehouse is the one data repository for a single version of the truth about the enterprise.

In the early days, the Data Warehouse was conceived as a place for business analysts to do their work. Business analysts like to ask difficult questions and another reason for creating a separate database is to give them a place where they can run complicated queries to answer these questions without disturbing the operational systems.

In practice, building the Enterprise Data Warehouse is difficult and expensive, and the result is an immensely valuable resource. Far too valuable so be left to the business analysts. So from the earliest days of data warehouses, they were connected to operational systems and used to help run the business.

The problem with this is timing. The original data warehouse was conceived as something that you could load at night with the days data and then the business analysts would query it during the day. However if you are running a call center off the information in a data warehouse because it has the best version of the data, the data warehouse has to contain the latest information.

For example, when a customer calls to complain that the product they bought that day does not work, the data warehouse needs to have been updated with the purchase so that the call center can verify it. This has led to the notion of a real time data warehouse. I explained this when I gave a presentation on "Real Time Business Intelligence" to the SDForum BI SIG in 2003.

So what does all this have to do with Master Data Management? Well I view Master Data Management as the legitimate name for this trend that I have described as real-time data warehousing. After all, real time data warehousing is not a very good name for the concept. It is a description of how we are getting there, while Master Data Management is a statement about the end point.

Of course the real story is not as clean as I have described. Enterprise Information Integration (EII) is about building a virtual data warehouse without bringing all the data together in a single physical database. Master Data Management does not necessarily imply that all the data is integrated into a single database, all it means that there is a single version of the truth about all the enterprise data and that this is available to all enterprise applications.

It is also worth noting that as with any new term there is still a lot of discussion and debate with different points of view about what Master Data Management really means.

Tuesday, November 08, 2005

Give it Away

The other day, I found myself writing "I agree with you that we should give away content to build an audience and membership rather than thinking about making people pay for it. In the internet age, the really successful business models like Yahoo and Google have been about giving away information to build an audience and then figuring out how to capitalize on it."

Its true. A recent article in Wired discussed how Bands are using MySpace to build a following by providing content such as giving some of their music away. It is exactly the kind of thing that Lawrence Lessig talked about when he spoke of the Comedy of the Commons last year to the SDForum Distinguished Speaker Series.

For the last 500 years, ever since the invention of printing, publishers then record companies and movie studios have been controlling the market for content by controlling the means of production. In the Information Age, reproduction and distribution of content is free, and the old content publishing empires that grew fat and happy by exercising their control are going down screaming. Oh to live in such interesting times.

Sunday, November 06, 2005

Good API Design

Joshua Bloch gave a great talk on "How To Design a Good API and Why it Matters" at the SDForum Java SIG last Tuesday. This kind of system design topic can be difficult because it can come across as apple pie unless it comes from someone who really knows what they are talking about. Joshua knows what he is talking about.

I found myself nodding along with Joshua's dictums. Here is one that particularly stuck in my mind. "An API is like a little language. Names should be self explanatory and be consistent, using the right part of speech. Aim for symmetry." I have written in the past about the connection between language design and APIs. The only problem is that while good API design is hard, good language design is even harder.

This dictum struck home because I recently made a mistake in a little API where I used the wrong words for a function. Unfortunately the problem was compounded because that word should have been used for another function that was missing, and that would have completed the symmetry of the API. When we renamed the errant function we could create the missing function and complete the API. Sorting out this little problem took a surprising amount of time.

My touchstone on this topic has been Butler Lampson's paper on "Hints for Computer System Design". A good part of that paper is concerned with interface design, and I have used Lampson's advice on interfaces with great success in the past. In fact, looking at this paper and my notes, Joshua echoes a surprising amount of Lampson's advice.

The Hints paper is more than 20 years old and since it was written we have moved from building computer systems to living in a networked world where APIs are our point of contact with the universe. Currently Joshua's words on API design are only available as a presentation. I hope that he writes it up as a paper or book so that everyone can have the full story.

Saturday, October 29, 2005

Everything Bad Is Good For You

Slashdot has just alerted me to Steven Johnson new book "Everything Bad Is Good For You" on how today's popular culture and new fangled inventions like television, video games and the internet are actually making us smarter. Both the above links have reviews and plenty of the expected reactionary comments.

What we have to remember is that they said the same kind of thing when printing was invented. Then they said that the new printed books were not as good as the old hand-written ones and that printing took away the artistry in book production. Another line of attack was that if everyone could have books to read that there would be nobody out there tilling the fields. It took about 50 years until the older generation had died off and the people who grew up with printed books fully accepted them.

I have been Googling all day to find suitable links for the above paragraph with no success. Maybe I will have get a subscription to Lexis/Nexis.

Sunday, October 23, 2005

A Cure for Aspect Ratio Madness

Some time ago I wrote about Aspect Ratio Madness, the problem that every device for displaying pictures and video has a different aspect ratio. In that entry I promised to suggest a cure, so here it is. We now live in the digital age and as you know metadata is the cure for all problems digital. For digital images there is already a standard for the metadata that goes with the picture.

The idea is very simple. My proposal is that we include metadata with the image about how to crop it and the display device, knowing its aspect ratio, uses that metadata to display the image. The new metadata is a minimum bounding box for display of the image.

The minimum bounding box is the part of the image that must be displayed when it is presented. The maximum bounding box is the picture itself. So when we come to edit a picture we crop the picture to the maximum that can be rescued from the image and we also crop the an interior section of the image that is the part that we really want to see. This inner crop is saved in the metadata.

When we go to display the image, either by printing it on a printer, or showing it on a computer slide show or when rendering a slide show movie, the display device decides how much of the picture to show, always including all of the minimum bounding box and then filling the display out as needed with rest of the image to fill the frame. If there is no image to show the display device uses its default, which is black bars for screen and blank (white) for printing.

The display device also knows whether it can rotate the image for display. When an image has a minimum bounding box that is taller that it is wide, a printer can rotate the image by 90 degrees, while a computer display or movie render cannot rotate the picture.

This works for still images because there is metadata that goes with each image. For video, there is metadata for the whole movie, however there is no metadata for each frame or shot. If we add the metadata for each shot in a movie, we can create video that can be shown on any display, 16x9, 4x3, or whatever and still look correct.

Wednesday, October 19, 2005

@#$% Threads

There are many different paradigms for concurrent programming and threads is the worst. We are working on a sophisticate threaded system. It uses supposedly standard technology, pthreads on Linux. The number and variety of problems is quite astonishing. Recently we have had some problems with thread termination.

One "little" problem is that part of the system links with a well known database management system. The database installs a thread exit handler which clean up whenever a thread exits. The problem is that not all out threads access the database system. When we cancel a thread that has not run any database code the database exit handler is called and because it has not been initialized, it SEGVs and brings down the whole system.

Another "little" problem is that I moved our code to a new system and could reliably get it to spin forever in the pthread_join procedure. As this comes just after a call to pthread_cancel, the conclusion is that the pthreads library on that system is not thread safe. The pthreads library is in the same location as it is on the system that works, however it is not exactly the same, which suggests that we have a duff pthreads library. After spending a couple of hours, I could not find out where the thread library on either system came from.

Neither of the problems is really a problem with threads, however they are typical of what a threads programmer has to deal with day to day. I have much to unload on real threads problems that we can look at other times. This is important topic as concurrent programming is the way of the future.

Thursday, October 13, 2005

Data Quality, The Accuracy Dimension

A couple of years ago Jack Olson spoke to the Business Intelligence SIG on his then recently published book "Data Quality, The Accuracy Dimension". I just finished reading the book and felt that it is well worth a review.

Data quality is a huge problem for Information Technology. In theory, IT systems capture all sorts of useful information that can be used to analyze the business and help make better decisions. In practice when we look at the data, quality problems mean that the information is not there. Data quality is about identifying problems with data and fixing them.

For example, the same customer may appear many different times in different forms so we cannot form an integrated view of all the business interactions with the customer. And then the address may be incomplete so we cannot mail the customer an exciting new offer that fits their profile exactly.

The book has several examples of databases with curious data. There is a HR database where the oldest employee appeared to have been born before the Civil war and the youngest employee had not yet been born. Then there is a medical database where people appeared to have operations inappropriate to their gender. There is also an auto insurance claims database with many different creative spellings for the color beige.

The book itself is divided into three sections. The first section describes the data quality problem, what data quality is and how the problem arises. The second section explains how to implement a data quality assurance program. The accent of this section is towards the processes needed to do data quality assurance, however it includes a chapter on the important topic of making the business case for data quality.

The final and longest section is a more technical look at implementing data quality, through data profiling technology. Data profiling is a set of analytic tools for analyzing data to find quality problems. In a simple case, grouping, counts and an order are enough to identify outlier data, like the multiple spellings of beige mentioned earlier. In other cases sophisticated algorithms are used to identify correlations that may indicate keys or other important facts about the data. Although this section is more technical, it is certainly not difficult to read or understand.

This is an extremely valuable book. Physically the book is smallish and unimposing. The writing style is straightforward, easy to understand. Yet the book packs a big punch. As I said before, Data Quality is a huge problem for IT. This book contains everything you need to start a data quality program. As such I think that it is essential reading for any IT person in data management, or for an IT consultant looking to expand their practice.

Although the book was published in 2003, it is just as relevant and useful now. In an era where most computer technology books are out of date by the time they are a couple of years old, this is a book that will last. I would compare it to the Ralph Kimball's "The Data Warehouse Toolkit" which is 10 years old but just a useful now as it was when it was first published. By the way, Kimball is a great fan of this book.

Monday, October 03, 2005

Planning Ahead

As I said in my last entry we tend to build software systems for the hardware that we have now and not for the hardware that will exist when the system is mature. To get a glimpse of what is next for software systems we should look at the shape of hardware to come.

According to Moore, the semiconductor people can see 3 technology generations ahead where each generation is about 2 years. They get a doubling of density every 21 or so months. Thus it is safe to extrapolate over the next 6 years, in which time semiconductor density will multiply by a factor of at least 8. Six years ahead is a good target for the software systems that we are designing now.

Given that a current rack server or blade has 2 dual core processors and 4 Gigs of memory, the same system in 6 years time will have 32 processors and 32 Gigs of memory. As I noted previously processors will not get much faster, so extra performance comes from more processors. There is no doubt that this configuration is viable as it is the spec of a typical mid to high end server that you can buy now. A high end system will have several hundred to a thousand processors and perhaps a terabyte of main memory.

What does this mean for software? Well the obvious conclusions are 64 bits to address the large memory and concurrent programs to use all the processors. Slightly more subtle is the conclusion that all data can fit into main memory except for a small number of special cases. All data in main memory turns existing database systems on their head, so a specific conclusion is that we will see a new generation of data management systems.

Saturday, October 01, 2005

Moore's Law

On Thursday, I went to the Computer History Museum celebration of the 40th anniversary of Moore's Law. The centerpiece of the event was Gordon Moore in converation with Carver Mead. David House introduced the speakers and in his introduction read some remarkably prescient passages from the 1965 paper describing applications for the microelectronics to come.

During the conversation Moore explained why he wrote the paper. At the time, integrated circuits were expensive and mainly used in military applications. Most people believed that integrated circuits were a niche products and would remain that way. Moore wanted to show that integrated circuit technology was advancing rapidly and that they were the best way of building any electronic product.

So in a sense the paper was marketing, selling the concept of integrated circuits to a sceptical audience with the goal of widening the market for their use, obviously to benefit the companies that were producing integrated circuits. At the time the idea was controversial. Even nowadays we often forget the remarkable logic of Moore's law, designing systems with the hardware that we have now rather than designing the system to exploit the hardware will be there when the system is fully realized.

A remarkable thing is that the original paper extrapolated Moore's Law out to 1975. Since then we have ridden the Law for another 30 years, and it is not going to stop any time soon. Moore told us that they have always been able to see out about 3 generations of manufacturing technology, where each generation is now about 2 years. So they can see how they are going to follow Moore's Law for at least the next 6 years.

Sunday, September 25, 2005

PL vs GUI

Kal Krishnan gave us a couple of interesting insight at the September meeting of the SDForum Business Intelligence SIG. In his talk "Event Based Architectures For Real-Time BI", he explained the iSpheres Event Server, a system for real time Business Process Management (BAM) that captures, transforms and does complex processing on business events as they happen.

Two things about his presentation struck me. The first was a claim that the iSpheres Event Server could handle 470,000 events per second. It is an impressive performance even when you discount by a factor of 3 or 4 for benchmarkism. I have heard that a stock market feed can have a peak rates of 100,000 messages per second so we get some hope that the iSpheres server would be able to handle it.

The other insight was that iSpheres uses a programming language to specify the behavior of the Event Server. Kal told us that they had started out by providing a GUI with icons and drag and drop as their programming interface and this had proved to be too cumbersome for programming the number of events that an event server needs. So, a few years ago they rethought the programming interface and designed an event programming language instead.

I have remarked before on this issue. In practice, you always have to provide a programmable interface, either with a programming language or through an API. From this, it is straightforward to provide a GUI, either as a plug in to an existing development system or by automatically generating the GUI from the language or API specification.

Wednesday, September 21, 2005

Processors Hit the Speed Limit

I was talking with a friend in the semiconductor business the other night. He told me that processors have hit their speed limit. Over the last 20 years, as silicon is scaled down, it has become faster, and we have benefited from faster and faster processors. But now as they continue to scale down, the silicon becomes leakier, and higher frequencies draw more power to the point of self defeat.

Of course, Moores Law continues and silicon continues to scale down, so we will continue to get more transistors on a chip, it is just that now the chips will not get faster as well as denser. I think that we all instinctively know that processors have hit a speed bump. The manufacturers no longer crow about how fast their chip goes, instead they talk about hyperthreading and dual core. So what my friend was telling me is that this is not just a speed bump, it is the speed limit. His view is that architectural improvements and throwing more transistors into the pot could give us another factor of 2 performance improvement, but that is it.

There is no big architecture breakthroughs on the horizon. The Von Neumann model has been around for almost 60 years. For the last 30 years, it has been criticized for serializing program execution, however nothing better has ever been made to work in a convincing way. Moreover, the Von Neumann model of sequential execution is embedded in the way we think about programming and a huge investment programming languages and all existing programs.

The alternative is that we write parallel programs to run on future generations of multi-core processors. The standard tool for writing parallel programs is threads. I have recently been writing threaded code and I can tell you that it is awful. Absolutely awful. I am going to write more pieces on the problems, but for now I can assure you that the problem of programming with threads is worse than the problem of programming memory allocation, something than many new programming languages have resolved by providing automatic garbage collection.

Monday, August 22, 2005

Data Mining Insight

Data mining is a difficult subject. On the one hand it is presented as this thing that will tell you all sorts of wonderful facts that you never knew about your data. On the other hand when you start getting into it, it is this daunting thing that is difficult to approach, seems to require a PhD in statistics to use and ends up telling you stuff that you already know, like when people buy bread at the market they are also likely to buy milk (or was that the other way round?)

At the August meeting of the SDForum Business Intelligence SIG Joerg Rathenberg, VP Marketing and Communications at KXEN gave a talk "Shaping the Future" about predictive analytics, which is the latest way of saying data mining. KXEN is a young, privately held company that is devoted data mining and that is successful, while many other data mining startups have fallen by the wayside.

The kernel of KXEN's success comes from powerful robust algorithms that do not require a specialist to tweak, high performance so that you can get results quickly and finally and most importantly, ease of use. As part of his presentation Joerg ran through a couple of data mining exercises showing us how you could take a reasonable sized data set in say the form of comma separated values (CSV), and using a few clicks and a several seconds processing, generate an interesting data analysis.

For me the key insight of the evening was on how to use data mining. I had always thought of data mining as a tool of last resort, when the data is too large or complicated, and nothing else seems to work, you resort to data mining to try and find something that you cannot see with the naked eye. On the other hand Joerg suggested that data mining is the first thing that you do when presented with a new business question, or a new data set. You use data mining to for the initial analysis of the data to find out which factors in the data really affect the outcome that you are interested in. Once these factors are identified, you can build reports or OLAP cubes using these factors as dimensions to explore in depth what is going on.

Thus data mining is something that you should be doing early and often in your data exploration. Joerg called this "Exploratory Data Mining" and it certainly resonated with audience members who do data analysis for a living. KXEN has designed their software to make exploratory data mining possible and even easy, and hope that by this means it becomes accessible to the masses.

Wednesday, August 03, 2005

Pauseless Garbage Collection

Programs generate a surprising amount of garbage, little pieces of memory that are used and then discarded. Low level programming languages like C require that the programmer manage the storage themselves, which is a surprisingly painful, time-consuming and error prone task. So these days application programs are written in languages like Java (or C#) where the system manages the storage and does the garbage collection. The result is much higher programmer productivity and much better and far more reliable programs.

The overhead of doing automatic garbage collection has always been a concern. However, another problem with automatic garbage collection is that up to now it has required that the system pauses, sometimes for a considerable length of time while parts of the garbage collector runs. A pause in a web application server stops customers from doing whatever they are trying to do. This can range from absolutely unacceptable in online stock trading to just very bad for customer satisfaction in a typical e-commerce application.

At the August meeting of the SDForum Java SIG, Cliff Click spoke on pauseless garbage collection. Cliff is part of Azul Systems, a startup that has developed an attached processor to run Java applications. As Azul Systems sells to large enterprises that run Java web applications to support their business, being able to do automatic garbage collection without pausing is an important feature.

Cliff is an engaging speaker who has spoken to the Java SIG before. Previously Cliff gave an overview of what Azul is doing. At this meeting, Cliff described the pauseless garbage collection algorithm in detail, and then went on to give us some indication of its performance. He had taken a part of the standard SPEC JBB Enterprise Java warehouse benchmark, modified it by adding a large slow-moving object cache and a much longer runtime that makes the benchmark more realistic and garbage collection more of an issue.

When the benchmark is run on an Azul system, the longest "Stop the world" pause is 25 milliseconds, whereas running the benchmark on other Java systems exhibited pauses of up to 5 seconds (yes seconds). On any platform almost all of the benchmark transactions run in under a millisecond. On the Azul system, no transaction took more than 26 milliseconds, which is very close to their maximum pause time, and well over 99% of the transactions ran in under 2 milliseconds. On the other Java systems, over half of the total transaction time could be taken up by transactions that took more than two milliseconds to complete.

While Cliff and Azul are proud of what they have done so far, they are not satisfied. So they are working on removing the last few vestiges of a pause from their system. We can expect even better performance in the future.

Monday, August 01, 2005

Aspect Ratio Hell

I returned from summer vacation with a large number of pictures which I am editing. Which leads to a difficult decision. When cropping the pictures, what aspect ratio do I chose for the images? This is not a clear cut question, and any investigation of which aspect ratio to use for cropping pictures leads to much confusion.

Last year, for example I had a beautiful picture of us getting Lei-ed as we arrived in Hawaii. I cropped the picture for a 4 x 6 print (aspect ratio 1.5) and then deciding that it made such a good picture, printed it on a 5 x 7 (AR 1.4) only to discover that the printer chopped off the tops of our heads. Looking further I see that it would get other results if I had tried to print an 8.5 x 11 (AR 1.294...) or a 11 x 17 (AR 1.545...) or a 13 x 19 (AR 1.461...). Fortunately the last two choices are moot because my printer cannot handle theses sized sheets.

There is more. I take a group of pictures and put them on a DVD that we can watch on TV. 60 pictures at 6 seconds each with a sound track will make a high energy 6 minute video of our vacation. However, this creates more aspect ratio choices. Currently TV's are changing their aspect ratio from 4 x 3 (AR 1.333...) to 16 x 9 (1.777...). (And why has the convention changed to putting the larger number first?) In practice TV's are even more difficult as they naturally chop off some lines at the top and bottom of the picture so that a video of still images introduces even more uncertainty when deciding how to crop the images.

If we want to see the whole image in the video, we can look at it on a computer monitor which does not lose scan lines from the top and bottom and will scale everything to fit. But there is a problem even in the logical world of computers. Most of the standard display settings have an aspect ratio of 1.333... (800 x 600, 1024 x 768, 1600 x 1200) however the majority of computer displays sold today are LCD panels at 1280 x 1024 (AR 1.25).

More confusing yet, the display size of 1280 x 768 (AR 1.666...) is becoming popular in both laptops that can be used for watching DVD's on long flights and with LCD TV's where the aspect ratio seems to match what is shown on a 16 x 9 TV even although 1.66... is not 1.77...

There is a lot more to aspect ratio and it only gets worse. For example, the above discussion assumed that the pixels were square which they do not need to be. I have some ideas on what can be done which will have to wait until another time.

Wednesday, July 27, 2005

Going off the Net

The internet is a wonderful thing, it is also a very dangerous place. Any computer system connected to the internet can and will be attacked. There is also a viral aspect to the internet, which means that while it is difficult to do with the internet and it is also difficult to do without.

At work we are building a software product that when combined with a number of other complex software systems on a cluster with a SAN will do wonderful things. However, configuring and debugging all this stuff requires a lot of privileged access.

The company has a lot of rules to protect its computer systems from attack. Part of that is denying users privileged access, which is sensible for most users. However when we want anything privileged done, we have to ask the IT guys to do it as they are the ones with the privilege, and each request takes its own time.

The test system arrived and my first thought was that we could bypass all these rules that slow progress by taking it off the net. If the test system is not connected, it is safe from attack, we do not have to follow all the rules and we can have all the privileged access we need to get things done as and when we want.

I suggested this at our project meeting and the first question was "if it is not on the net, how do I telnet to it for debugging?" I described sneakernet, the secure alternative to the internet. You burn a CD, pop it out, walk across the room, pop it into the test system and "Robert est votre oncle".

Everyone looked at me like I was mad, or maybe they thought that I was just lost in another millennia. The notion that you needed to be in the same room as the test system seemed retrograde. As did the idea that you could not be browsing Slashdot while waiting for that conditional breakpoint to pop. So we are going to have to put up with the constant battle with IT to get simple things done because now it is impossible to go off the net.

Tuesday, July 19, 2005

Open Source BI

Open Source Business Intelligence is hot. This month the SDForum Open Source SIG hosted a talk from JasperSoft about their open source reporting software. Tonight the Business Intelligence SIG heard from Sandeep Giri about their OpenI initiative to provide an Open Source BI application.

Currently OpenI is a BI application built on Open Source components that provides visualization of data from an OLAP source such as SQL Server or Mondrian. Sandeep has plans to expand OpenI to become a fully fledged BI platform.

I had asked Sandeep to talk about why they are making their application Open Source. Sandeep explained that his company, Loyalty Matrix, provides Software as a Service (SaaS). [This means that they can use most Open Source software to provide their service.] While there are several Open Source components with a BI flavor, there was no ready to run Open Source BI application or platform.

Loyalty Matrix has had to develop their own platform using Open Source software. By giving their application back to the Open Source community they get the implementation help, support and feedback of a large user group to complement their tiny development team.

Wednesday, June 22, 2005

Show me the Numbers

Yesterday we had a great talk at the SDForum Business Intelligence SIG from Stephen Few called "Show me the Numbers". Stephen has created a career for himself teaching how to present business information in a way that is clear and understandable. And as he told us in the meeting he did this because he saw a crying need for better information presentation and nobody else was doing it.

One of the chief problems is that BI systems vendors compete by offering the most features and in the presentation space this is done by offering a wide variety of impressive 3-D graphs. We looked at and critiqued a number of graphs taken from vendors sales and education literature and we had a field day. For example, in one memorable graph, the months in the time dimension had been sorted in alphabetic order!

When it comes to selecting the type of graph to use, many people use "Eeny Meeny Miny Moe", or look through the list of graph types and select one because they have not used it for some time. I know that I have used this method in the past. Stephen has a well thought out methodology for designing information presentation and he started to go through it, although there was not enough time to go into depth. If you want to find out more you have to read the book.

Sunday, June 12, 2005

Apple's ISA Shift

The most extraordinary news last week was the announcement that Apple would drop the Power chip and adopt the Intel chip and Instruction Set Architecture (ISA). There was a lot of speculation and disbelief before the event, the announcement in its own reality distortion field, and generally positive or at least understanding reportage afterwards.

First I should disclose a couple of personal connections with this event. The first one involves my mother-in-law (God bless her soul). Many years ago she wanted to buy a computer and hearing that Mac's were easier to use, bought one of the last 68000 based systems that Apple made. A couple of years later she got fed up with it because cruising the web had become impossible. Most complicated web pages took a very long time to render and many caused her system to crash. Of course there were no software upgrades to help her, so she went out and bought a PC which worked well for several years. This last January it broke down and she asked me for advice on getting it replaced. I suggested buying the Mac Mini that had just appeared and she bought one. Now I hear that it is obsolete, so is she going to be in the same position as she was with her last Mac? And this time its all my fault!

The second disclosure is that I worked for DEC, and was tangentially involved in their disastrous dithering with ISAs. Firstly they decided that the VAX ISA had no horsepower left so they adopted the MIPS chip for a short time and then completely changed tack to use their own internally developed Alpha chip and ISA. Old DEC hands may have thought that as they already had 12, 16, 18, 32 and 36 bit systems that throwing a couple more ISAs to the masses was business as usual. In practice customers took it as a sign that the company did not know what it was doing. DEC had always been deliberately run at the edge of control and when the customers deserted, DEC crumbled.

Back to the story. The most interesting commentary came from Cringely. I do not believe his major thesis, that this in an opening move in a merger between Intel and Apple. I do think that he is right on the money with his "Question 4: Why announce this chip swap a year before it will even begin for customers?" Who is going to buy a Mac now, knowing that it will be obsolete when Apple changes their ISA next year? Apple has basically destroyed the market for their major product for the next year.

There are a couple of possible reasons for making the announcement now. Firstly Apple may have decided that it was not going to keep it a secret and so the announcement was inevitable and put the best possible spin on a difficult situation. Alternatively, it could be hubris. Jobs has a history of being very successful and then getting caught up in his own reality distortion field to the extent that he blows it big time. His last big failure was Next which was colossal. This could be his next, we will see.

One other vector is Intel. Cringely is on the money in that they have a big stake in this. Microsoft just announced their xBox 360, which uses an IBM chip. If the xBox 360 turns out to be the home media center in disguise, Intel has potentially lost the consumer part of their franchise. In this context a link up with Apple who also have their eye on the home media center makes perfect sense.

Thursday, June 09, 2005

What the Dormouse Said

An extraordinary event at the PARC last night. The SDForum Distinguished Speaker Series went out with a bang as John Markoff spoke about his new book "What the Dormouse Said: How the 60s Counterculture Shaped the Personal Computer Industry". The truly remarkable thing about the night was the audience, as many of the people featured in the book were in the room.

Sandy Rockowitz introduced several audience members. The names I recall are Doug Engelbart, Bob Taylor, Butler Lampson and Adele Goldberg. Sandy did not introduce everyone and I did not catch all their names, so this is a partial list of a partial list, but it gives a glimpse of who was in the room.

John Markoff presented his thesis. The conventional story of Personal Computers is a story about Xerox PARC and the two Steves. The real story goes back much further to the 1960s and involved Doug Engelbart, John McCarthy, LSD, and only later Stewart Brand, the Homebrew Computer Club, and Stanford institutions like Kepplers bookstore around which the grateful Dead also formed.

At the meeting several people denied their personal involvement with drugs. I have not read the book so I do not know whether drugs really played a part or whether they are there promote book sales. I will let you know my opinion when I have finished it.

Saturday, June 04, 2005

Spom?

I was using Google to do some research at work the other day, and I noticed that whatever I searched for I got lots of useless results from the likes of eopinions.com and bizrate.com. In fact, for some searches, the pollution of off-topic search sites was so high that I found nothing useful in the first few pages of search.

Personally, I have found sites like eopinions.com and bizrate.com to be completely worthless, even when I am looking to buy something. Whatever I am looking for they seem to offer something slightly different and whatever they present is jumbled and difficult to comprehend. The Google sponsored links are on target, easy to digest and much more likely to be useful than any of these sites.

The problem is that the usefulness of internet search is being destroyed by these sites. So can we do about it? The first step in any campaign is to name the enemy. I propose we call it Spom. We have Spam polluting our email and Spim polluting instant messaging, so Spom polluting searches seems to fit. We could even explain it by saying that SPOM is the contraction of Search POlution Menace. Well it is a bit forced, but it will do. Finally Spom is so far unused (mostly).

Wednesday, May 18, 2005

Collaborative Filtering Hell

I was at Amazon.com getting some stuff and clicked on "Richard's Store" tab which of course led me to their "Recommended For You" page. Suddenly I was lost in collaborative filtering hell. All I could find was stuff that I already had or that I had already considered and rejected. I could not find anything that interested me in the slighest and the further I looked the worse it got. Have you ever been lost in a maze of twisty little web pages with no idea of how to get out? The only cure was to exit completely and come back in through the front door.

I will never visit my personal store again!

Friday, May 13, 2005

Interesting Times

Wow, things are getting hot. So many things happened this week. Microsoft announced the new xBox home media center. Yahoo came out with their Napster/iTunes killing music service. Google is up to something that is not fully revealed yet. Gates dissed the iPod, claiming that the converged cell phone, camera, music player, PDA will be Microsoft powered. Could things get more interesting?

Cringely has a great column that covers the first three and some Apple rumors. I am surprised that Microsoft happened to announce a double pronged attack on the home PC and the pocketable information appliance in the same week.

After writing that all information task could be done by a single Information Appliance, I was going to write about how we really need three different types of information appliance with different form factors. There is the information appliance with a 42" or larger screen that is mostly for passive shared use. Next there is the personal information appliance with a 14" or larger screen for information intensive tasks like editing and composing media and emails, shopping and bookkeeping. Finally, there is the pocketable information appliance is small enough to be taken everywhere, even the bathroom.

Microsoft won the battle for the middle one and has been going after the other two for some time. Will they have more luck this time? We will just have to wait and see.

Thursday, May 12, 2005

BlueRoads

While we are on the subject of Software as a Service (SaaS), I almost forgot to mention that last month the Business Intelligence SIG heard a entrancing presentation from Axel Schultze, CEO of BlueRoads Corporation.

BlueRoads is a software service that does Partner Relationship Management (PRM). Basically they provide a set of applications that allows a company to interact with its channel partners. Unlike CRM which a company usually imposes on their salesforce, the BlueRoads model is that they are an intermediary between the business and the channel partner. To make it work, BlueRoads creates applications that provide value to both the channel partner, so that they use it, and to the company, so that they pay BlueRoads.

If BlueRoads can pull it off, they have something wonderful. While there are other companies in the PRM space, none of them have anything like the BlueRoads SaaS business model. They have both viral marketing and the network effect in their favor. BlueRoads is currently concentrating on high tech where channel sales are growing to be a larger proportion of total sales and the whole market is growing., so they are in a growing segment of a growing market.

And all they have to do is pull it off!

Wednesday, May 11, 2005

Software as a Service

The SDForum has a new SIG called Software as a Service (or SaaS for short) and they had their first meeting tonight. If you did not blink at the height of the boom, you may have seen the shortest lived SDForum SIG, the ASP SIG that had a life of less than a year. Now software as a service is back, and this time it is here to stay.

Tonight's presentation was given by two analysts from IDC, a market research company. I believe that the presentation is their standard presentation that they give when they are touting for business, however it did contain a lot of interesting stuff. As a software developer, a number of things stood out.

Packaged software offers a bad user experience. The software company creates a product which they throw over the wall to the customer. The customer implements it with little or no help and with great difficulty. At the same time the software company has little idea what the customer is doing with the product to help them plan features for the next release. On the other hand, software as a service is implemented by the software company who have a motive for helping the customer make the best use of the software. As the software is run by the software company, it can easily find out how the software is being used to help the customer make the best use of the software and to understand how to enhance it.

Large and medium sized companies tend to use software as a service to enhance existing applications, while smaller companies use it more to replace existing applications. In all sized companies the number of new applications using software as a service is small. In fact the analysts seemed optimistic that software as a service would increase the size of the software market. (On the other hand I have never met an analyst with a pessimistic market projection.)

The most common reason for switching to software as a service is the prospect of having to upgrade an existing software package. I am sure that having to upgrade is a major reason for changing to a new software implementation anyway, however software as a service does not suffer from the upgrade problem in the same was as packaged software does, as the upgrade is done by the software provider.

Monday, May 02, 2005

A Colleague Moves On

It is always sad when someone in your circle moves on, at the same time is always encouraging when someone in your circle moves on to bigger and better things. Alex Chan has done a multitude of things and touched many people since he came to Silicon Valley: co-chaired the Multimodal SIG and done numerous other things for the SDForum, President of the Chinese Software Professionals Association, founder and leader of Silicon Valley HUB, as well as a demanding day job at Cisco Systems. Now he is moving to China to an important research post in Shanghai.

Alex, we are sorry to see you go and happy for your new position. Bon chance!

Sunday, May 01, 2005

Event Driven Mobile Applications

This month the SDForum Software Architecture and Modeling SIG heard about a real application for mobile information technology. The talk was called "Worlds collide: when mobile, real-time requirements meet fragmented legacy systems: Lessons learned in providing real-time patient information to emergency room physicians."

The point of the talk was that with mobile applications (as with most other applications) , the important problems turn out to be different that the problem anticipated by the developers when they start the project. In particular the speakers proposed that an emergency room mobile information system needs to be situationally aware so that it helps the doctor in a high pressure and demanding environment rather then get in the way.

The thing that I took away from the talk is a desire to give the doctor more of an event driven life. When a patient comes to an emergency room, the hospital or medical organization may already have a lot of information about the patient. While there, the doctor may order several tests and the results of these tests is further information about the patient. All of this information can be made available on a portable device.

A problem is that the patient information may not be available immediately and the results of tests dribble back. Thus the doctor is always polling to see whether they have enough information to go back to the patient and make a further assessment or order a treatment. As the doctor is responsible for several patients, the doctor has several queues of events to poll.

When implemented with conventional technology, each patient has a box. Paper with test results arrive and are put in the box. The doctor is constantly looking through boxes to see what new results have arrived since they last looked.

The mobile application uses an iPAQ type PDA with a fingerprint reader for easy authentication. The primary screen shows a list of patients with a simple traffic light scorecard of the types of tests that have been ordered and whether the results are available. The doctor still has to look at the screen and understand what has changed, however they have a single screen with the information about what they can do next.

Mobile applications are going to become increasingly important and this presentation convinced me that they are not just desktop or even laptop applications with a small screen. Mobile applications need to be more situationally aware to overcome the constraints of the UI. At the same time done properly they can remove the need to poll and allow us to lead the event driven life.

Tuesday, April 26, 2005

The Event Driven Life

A few years ago the Smithsonian did an exhibition on time. It explored the changing way in which Americans have measured, used and thought about time, with a movement from time marked by the Sun's cycle to time marked by the tolling a clock bell to time marked by a personal watch.

The watch revolutionized our relationship with time. Before the watch everyone got up at sunrise, ate when the Sun was at its highest and retired when the Sun went down. The clock tower with its bell marking out the hours allowed a slightly better control of time. For example, it meant that everyone could arrive for the Church service at the right time.

In the 19th century mass production and improved manufacturing techniques reduced the cost of a watch to the point where everyone could afford to own one (does this sound familiar?). In a world where everyone has their own timepiece, people could schedule their own use of time, and now they are expected to. So instead of the time being broadcast by the clock tower bell with the granularity of an hour, we each keep our own time to the granularity of a few minutes.

With a watch, we have to keep a list of appointments and constantly look at the watch to keep on time. It is well known that polling is an inefficient algorithm, and constantly checking a watch is very distracting for everyone. We all know a type-A person who never seems to relax or even listen to what we are saying because they constantly looking at their watch, indicating with their body language that the next appointment and keeping on schedule is more important than whatever they are doing now.

The next stage in the evolution is to eliminate the distraction of looking at a watch and polling it by giving the job of managing out personal calendar to an information appliance. This is a portable device that knows our calendar and the time and lets know what we are supposed to be doing now. With this device we can lead the event driven life, the machine tells us what to do and when to do it. The evolution is complete when we stop thinking about time because we never have to know the time, only what we are supposed to be doing.

Some of this exists. In my estimation we are about a third of the way there. You can buy a PDA and synchronize it with your corporate calendar. Within the corporation others can see your calendar and organize events with your schedule in mind. However synchronization is weak and the corporate calendar is a closed system that does not interoperate with other calendar system.

Moreover, calendar applications are very primitive, lacking many useful features. For example, if I have to travel to an event and want to ensure that I am not scheduled for something else when I am supposed to be traveling, I have to calculate travel time and add that to the event, so I lose the event starting time. If I create the travel time a separate item it is not associated with the event and does not change when the event changes. Another example is that there is no function to buzz me 5 minutes before the end of a meeting to remind me to wrap up so that I can make the next meeting on time.

Interoperability is another problem. The real problem is that there is tight integration in the closed corporate system and no outside integration. If I arrange a dental appointment, I have to enter the appointment in the calendar by hand. Later on the dental receptionist will call me to remind me of the appointment which I again have to check by hand. What should happen is that the dentist sends me an email confirmation of the appointment and I right click on the email to add it to my calendar. This should just work between all email clients. Again, if I go to a web page that announces an interesting event, or I use the web to register for an event, I should be able to right click on the web page and select "add to calendar" using the browser plug in for my calendar application.

The good news is that there are clever people who realize that there is a problem and are trying to solve it. The bad news is that the solutions are closed systems. What we need are open standards that allow for interoperability. Again, clever people have realized this, but so far the standards have not stuck.

Tuesday, April 19, 2005

Blackjack and Hookers

My culture antennas started to twitch. There was something about what I was reading that touched a chord. I immediatly did a search on the phrase "blackjack and hookers. In fact, forget" and got about 5000 hits! Is this the new "All your base are belong to us"?

Wednesday, April 06, 2005

The Information Appliance

Tools and appliances are best when they are designed for one purpose and only used for that purpose. We buy many tools to help us with our daily tasks. I have 4 hammers in my garage and each is designed for a specific purposes. Likewise, in the kitchen there are 6 or 7 different knives, again each with their own purpose. However there is always someone out there trying to sell you a "bargain", the multi-purpose gadget.

Multi-purpose gadgets always turn out to be useless at doing the many tasks that they are supposed to cover. A long time ago Mad magazine send the concept up by proposing a coffee grinder/pencil sharpener/garbage disposal. (Imagine how your coffee would taste from this machine!) After many experiences with eagerly buying a multi-purpose gadget and then being disappointed I have firmly forsworn the whole concept. Each tool for one purpose and one purpose for each tool. When I was looking for a coffee maker recently, my main selection criteria was to avoid the models with a built in clock radio.

On the other hand, information is different. We have entered a new era, the digital age where all information is represented digitally: sound, pictures, video, text, diagrams, charts. We create, massage and consume all information in digital form. In the digital age we do not have, need or want separate appliances for each form of information. One type of appliance, the information appliance is sufficient to handle all our information needs. Here are some of the information tasks that I do at home:

Personal communication by email and instant messaging.
Track home finances and do taxes.
Edit pictures and video, create slide shows and home videos.
Consume media, magazines, music, video.
Play games.
Shopping and research for shopping/living.
Research for hobbies.
Create presentations and writing for non-work activities.
Analytic research of stocks.
Write blog.

This does not include work activities, and all these activities are done with the same information appliance. Now I know that there are a different definitions of what an information appliance is. There are also people out there who want to sell us a different appliance for every purpose. We will get to all that, but for now I want you to contemplate the fact that one suitably specified information appliance is sufficient for all your information processing needs.

Tuesday, March 22, 2005

A Place for the PC?

While it is not exactly news, today Bill Gates responded to a Nicholas Carr article on the "Requiem for the Corporate PC". Carr has a good point. A large proportion of the employees with PCs do not need a high powered general purpose computer with a very hackable operating system on their desk. In large corporations the majority of the workers could be better served by a relatively thin client that gives them access to many of the services that they need on well protected corporate servers.

The problem is that the best general office productivity suite comes from Microsoft, so corporations do need to buy the highly hackable operating system with its attendant support costs to run the standard desktop. The funny thing is that while Microsoft did an extraordinary job with Office their Outlook is not so stellar.

Unfortunately, while Carr may have a good point, he tends to sex up his argument by personalizing it and that drag us down the wrong path. So rather than discuss whether the corporate lacky is best served by a thick or thin client, Carr gets in a fight with Gates. In his response, Gates is more abstract and less compelling than usual.

I think that there are interesting and compelling arguments on both sides, I would just like to see them being made.

Sunday, March 13, 2005

The ASP Model

The PC revolution was about everyone getting their own computer and running their own applications. The Application Service Provider (ASP) model is about bringing all this under control, by using the universal connectivity of the internet to run applications professionally and centrally, often outsourced to a third party.

The ASP concept has had a rollercoaster ride. At the hight of the boom 5 years ago, ASPs were going to be the next big thing. Then the boom deflated and most ASPs disappeared. A couple survived, most notably Salesforce.com, and suddenly the ASP model has become so fashionable that now I hear you cannot get funding for a new software venture unless your proposal has a service delivery component.

I am starting to appreciate the ASP model. Last week the disk on my work laptop started to die. At first the symptom was that a couple of applications did not work. Unfortunately the applications were email with calendering and IM, so I was lost, not knowing what I was supposed to be doing and unable to communicate with anyone. It took a day and a half to identify that the problem was the disk and not the applications, another day to identify that the disk was beyond repair so I needed to get a new one, a day to get the new disk and another day to get it installed and set up. So I was without my computer for a week.

The good news was that we use Lotus Notes at work which is run on the ASP model from centralized servers configured for high availability and reliability. As soon as we figured out that it was a hardware problem, I got a loaner laptop, connected it to the Notes server and was back in business with email, calendar and IM.

I was secptical of ASPs at first, but seeing how well it worked for my recent problem, I am starting to see the point of having professionally managed services delivered from centralized servers over the internet.

Monday, March 07, 2005

More Windows Fuglies

There are those times when I feel the jaw muscles tighten and my teeth start to grind. Recently Windows is has been causing my fillings to crack. At home, I have discovered that I can render the movie by switching off DMA mode on my disk drive. The movie renders with the disk in PIO mode, it just takes 10 times as long. So I find myself doing the rendering overnight just as I used to do when I had a 150 MHz processor.

The thing is that I rendered movies in January without a problem, and the only change to the system since then has been Microsoft's persistent security updates. The patches have been coming so fast and furious recently that I have not even been looking to see what they are. Anyway, given the way they are delivered, Microsoft seems to encourage the ignorance is bliss approach to patches. In this case ignorance has lead to anything but bliss.

The "funny" thing is that the hint to switch off disk DMA came from a NT bug workaround from years ago. Has the bug been reintroduced, or more likely the original source of the problem was never properly fixed and a change to another module has uncovered it again. I suppose that I should uninstall the patches one by one to find the one that is the cause of my problem, however that would mean doing the absolutely ridiculous act of writing down bug numbers on a piece of paper because sure as hell you cannot use the computers cut and paste to copy them (subject for another rant that has been brewing for a long time and seems close to erupting).

I will not go into the anguish at work where my computer has switched off DMA mode on the disk drive and cannot be persuaded to switch it back on again.

Sunday, February 27, 2005

Windows is out of my Control

Microsoft Windows is out of my control. I just want to do something simple, render a movie, but the movie render program exits at random about 3 minutes into rendering. The customer support section of the render program web site suggests that I close down all background tasks before I start rendering in case they are interfering with the render. When I watch with the task manager, I see plenty of background processes that pop up from time to time and they may well be the cause of the problem.

However, is absolutely impossible to close down background processes. My system is not owned. I do not have a spyware problem, it is just that I do not have proper control over the components that I have installed on my system. There is an unwanted Windows component that will not go away. I have uninstalled it, however its process still runs and when I kill the process with the task manager it just springs back again after a few seconds. There are other Windows components that are too scary to stop but annoyingly active when they do not need to be.

My video card has two unnecessary background processes. I have managed to disable one, however I cannot find a control to disable the other one. My sound card software has unnecessary background processes that cannot be controlled and when I stop them with the task manager they spring back to life unasked. My CD burner software has several unnecessary background tasks that cannot be controlled in any sane way, and I have spent hours looking through the options dialog boxes for ways of disabling them.

I want to stop the anti-virus program as I know from experience that it can cause all sorts of problems. So I disconnect from the internet and disable it. The problems are firstly that even although I have disabled the anti-virus program its processes are still active and doing things. Secondly, I find out that there is a program that is run periodically to check whether the anti-virus system has been disabled and warn the user. So the total effect of disabling anti-virus to get rid of background processes is just to make extra background processes appear at regular intervals.

All in all it is enough to make me want to give up on Windows and buy a Mac!.

Wednesday, January 26, 2005

UI and Scripting

One thing that Scott Collins mentioned in his talk tonight on "Mozilla Lessons Learned" at the SDForum Software Architecture and Modeling SIG touched a nerve. He suggested that User Interfaces should be scriptable. This is something that I have believed for a long time.

In my early days I built several interactive user interfaces only to find that their principal use was with scripts. This was before GUIs so you could just feed a script into any UI. The problem was that as I had assumed that the interface was being used interactively it did not handle errors in the script gracefully, and the program would often get wedged.

Recently I have only been tangentially involved in UI design, however there are two things that I have noticed. Firstly, GUIs are notoriously difficult to test, and secondly, as always, any worthwhile application that has a GUI also needs an API for automated access. Consider that a scripting interface can be used to expose programmable access to the application and we get to a thought about how we should be building applications with GUIs.

Instead of designing an application with a GUI, we should be designing the application with a language interface. This makes the application accessible from a scripting interface, and it also allows comprehensive automatic testing of the application through scripts. At the same time the application needs a GUI, which should be generated automatically from the language definition. Because the application is thoroughly tested through its scripting interface and because the GUI is generated automatically, the GUI does not need a lot of testing. Provided that the GUI generator works properly, the application GUI should work properly.