Build and Break

Saturday, November 26, 2005

The Future of Music

Some time ago I wrote the following in a piece on intellectual property in the digital age:

"... record company that on one hand pays radio stations to BROADCAST its hit song, while at the same time complaining that it is losing money because people are sharing the song on their computers"

Today I was looking at internet radio software that receives a broadcast stream from an internet radio station, and converts each song into a MP3 complete with ID3 tags. The point is that to build an MP3 collection, you do not need to illegally download music, all you need to do is capture the broadcast stream.

In practice this turns the economics of music on its head. In the past, performers made most of their money from recordings and used live performances to promote the recordings. The emerging business model is that performers give away their music to build an audience and then make their money from live performances.

In practice, this model has a lot to recommend it. As the audience grows older, they tend to become financially better off and willing to pay more for live performances. Properly managed, a performer can build a lifelong career that becomes more and more rewarding. Not surprisingly, this business model is both well proven and emerged in Silicon Valley.

Sunday, November 20, 2005

Consumer Content Management

Could there really be a huge and important consumer software category that is not being properly addressed? I got to thinking this when I happened across a piece that proposed the emergence of a new type of software, the media file "Type Manager". The concept is that each type of media like pictures and music needs its own file manager that knows about that type of media.

I could write a long harangue about how awful these programs are and how far I will to go to avoid them. For example, one major peeve is that they deliberately hide the location of the file so that when I want to edit it or use it in another application, I have to resort to some trick to find out where it is located so that I can open the file in that application.

However a long harangue would just obscure a much more important point. There already exists a category of software that addresses all the issues raised by the media file type manager and does a lot more. It is called Content Management.

As a software category, Content Management has gone through the usual evolution. The original applications were high-end one off applications created for demanding users like CNN and the BBC that have enormous content management problems. At the same time a mid-range thread emerged from web media organizations like Salon who developed their own software to manage their production (and which has now gone Open Source). Now Content Management is expanding into general use as all organizations discover that they have digital content to manage. The final step is for Content Management to become a consumer product.

Content Management has three major components:

Version management. Every time you edit a media file, for example, changing the level of an MP3 or cropping a digital image, you create a new version of the file. Version management keeps track of all the versions of a piece of media and how they are related.
Metadata management. All types of media files contain metadata, sometimes known as tags. However, there is always the need for more metadata, some of which, like version information, does not really belong in the file. Better to extract all the metadata and put it in a database where it can be searched, collated and aggregated.
Workflow. This is the major part of professional Content Management and in some sense its reason for being. In a consumer application, a single person does all the tasks so there is less need for workflow, however it can still be useful for automating repetitive tasks.

The most important feature of Content Management is that it handles all media types. I do not want a separate application for each media type, each with its own user interface and set of annoying little quirks. Also, it is useful to relate media, such as keeping documents of lyrics with songs, keeping track of the music used in video and slide shows, connecting pictures, text and perhaps music in web posts.

There is a lot more to say about metadata, version management and the structure of a universal content manager. We will have to get to them at another time. For the mean time, are you ready for Consumer Content Management? I know that I am.

Has Sony-BMG been Caught Stealing Software?

The recording companies lecture us on how we should not steal Intellectual Property. Through their industry association, they prosecute our children. Now one of them is being accused of stealing software content. If they want us to respect their property rights, they need to remember that they are not above the law themselves.

Wednesday, November 16, 2005

Predictive Analytics Redux

Anyone who did not come the SDForum Business Intelligence SIG meeting on Tuesday missed a great talk. Eric Zankman led us through a case study of a customer analytics engagement with a large telecom company that addresses a specific business issue and eventually provided a measurable multi-million dollar return.

During the engagement he had: built a data mart to collect the data, built a set of predictive models, segmented the customers, developed a set of strategies for handling the business problem, run a series of tests with different strategies to understand the costs and benefits of each strategy and finally set things up so that the customer could continue to monitor and refine their strategy.

Eric described each stage with enough clarity that I feel that I could reproduce his work if I were asked to. Of course I would not do it as well as Eric did, but that is not the point. I have read books, been to classes, heard presentations on customer analytics and never seen such a simple yet comprehensive walk through of what to do and how to do it.

So if you did not come, you missed a great meeting. Sign up with our mailing list/group so that you do not miss another meeting.

Saturday, November 12, 2005

Master Data Management

There is a new term in enterprise software - Master Data Management. While the term is new, the concept it not quite so new. I view it as the end point of a change that has been going on for some time.

The concept of an Enterprise Data Warehouse emerged during the 1990's. One of the compelling reasons for creating an Data Warehouse is to create a single version of the truth. An enterprise has many different IT systems, each with its own database. The problem is that each database has its own version of the truth.

So for example, consider a typical enterprise that has many customers. It will have marketing databases and several sales databases, each associated with the IT system for that sales channel. There are also service systems with their associated databases. The same customer may appear in several of these databases that support the business operations and in each one the customer information is different. There are variations in the name, different addresses and phone numbers or in some cases no contact information at all. While this example is about customers all other enterprise information also exists in many databases, and anywhere that information is multiplied, there are bound to be contradictions, failures and gaps.

Rather than try to resolve this mess, the idea of the Enterprise Data Warehouse is to create a new database that contains the clean and corrected version of the data from all the operational databases. So the Enterprise Data Warehouse is the one data repository for a single version of the truth about the enterprise.

In the early days, the Data Warehouse was conceived as a place for business analysts to do their work. Business analysts like to ask difficult questions and another reason for creating a separate database is to give them a place where they can run complicated queries to answer these questions without disturbing the operational systems.

In practice, building the Enterprise Data Warehouse is difficult and expensive, and the result is an immensely valuable resource. Far too valuable so be left to the business analysts. So from the earliest days of data warehouses, they were connected to operational systems and used to help run the business.

The problem with this is timing. The original data warehouse was conceived as something that you could load at night with the days data and then the business analysts would query it during the day. However if you are running a call center off the information in a data warehouse because it has the best version of the data, the data warehouse has to contain the latest information.

For example, when a customer calls to complain that the product they bought that day does not work, the data warehouse needs to have been updated with the purchase so that the call center can verify it. This has led to the notion of a real time data warehouse. I explained this when I gave a presentation on "Real Time Business Intelligence" to the SDForum BI SIG in 2003.

So what does all this have to do with Master Data Management? Well I view Master Data Management as the legitimate name for this trend that I have described as real-time data warehousing. After all, real time data warehousing is not a very good name for the concept. It is a description of how we are getting there, while Master Data Management is a statement about the end point.

Of course the real story is not as clean as I have described. Enterprise Information Integration (EII) is about building a virtual data warehouse without bringing all the data together in a single physical database. Master Data Management does not necessarily imply that all the data is integrated into a single database, all it means that there is a single version of the truth about all the enterprise data and that this is available to all enterprise applications.

It is also worth noting that as with any new term there is still a lot of discussion and debate with different points of view about what Master Data Management really means.

Tuesday, November 08, 2005

Give it Away

The other day, I found myself writing "I agree with you that we should give away content to build an audience and membership rather than thinking about making people pay for it. In the internet age, the really successful business models like Yahoo and Google have been about giving away information to build an audience and then figuring out how to capitalize on it."

Its true. A recent article in Wired discussed how Bands are using MySpace to build a following by providing content such as giving some of their music away. It is exactly the kind of thing that Lawrence Lessig talked about when he spoke of the Comedy of the Commons last year to the SDForum Distinguished Speaker Series.

For the last 500 years, ever since the invention of printing, publishers then record companies and movie studios have been controlling the market for content by controlling the means of production. In the Information Age, reproduction and distribution of content is free, and the old content publishing empires that grew fat and happy by exercising their control are going down screaming. Oh to live in such interesting times.

Sunday, November 06, 2005

Good API Design

Joshua Bloch gave a great talk on "How To Design a Good API and Why it Matters" at the SDForum Java SIG last Tuesday. This kind of system design topic can be difficult because it can come across as apple pie unless it comes from someone who really knows what they are talking about. Joshua knows what he is talking about.

I found myself nodding along with Joshua's dictums. Here is one that particularly stuck in my mind. "An API is like a little language. Names should be self explanatory and be consistent, using the right part of speech. Aim for symmetry." I have written in the past about the connection between language design and APIs. The only problem is that while good API design is hard, good language design is even harder.

This dictum struck home because I recently made a mistake in a little API where I used the wrong words for a function. Unfortunately the problem was compounded because that word should have been used for another function that was missing, and that would have completed the symmetry of the API. When we renamed the errant function we could create the missing function and complete the API. Sorting out this little problem took a surprising amount of time.

My touchstone on this topic has been Butler Lampson's paper on "Hints for Computer System Design". A good part of that paper is concerned with interface design, and I have used Lampson's advice on interfaces with great success in the past. In fact, looking at this paper and my notes, Joshua echoes a surprising amount of Lampson's advice.

The Hints paper is more than 20 years old and since it was written we have moved from building computer systems to living in a networked world where APIs are our point of contact with the universe. Currently Joshua's words on API design are only available as a presentation. I hope that he writes it up as a paper or book so that everyone can have the full story.

Saturday, October 29, 2005

Everything Bad Is Good For You

Slashdot has just alerted me to Steven Johnson new book "Everything Bad Is Good For You" on how today's popular culture and new fangled inventions like television, video games and the internet are actually making us smarter. Both the above links have reviews and plenty of the expected reactionary comments.

What we have to remember is that they said the same kind of thing when printing was invented. Then they said that the new printed books were not as good as the old hand-written ones and that printing took away the artistry in book production. Another line of attack was that if everyone could have books to read that there would be nobody out there tilling the fields. It took about 50 years until the older generation had died off and the people who grew up with printed books fully accepted them.

I have been Googling all day to find suitable links for the above paragraph with no success. Maybe I will have get a subscription to Lexis/Nexis.

Sunday, October 23, 2005

A Cure for Aspect Ratio Madness

Some time ago I wrote about Aspect Ratio Madness, the problem that every device for displaying pictures and video has a different aspect ratio. In that entry I promised to suggest a cure, so here it is. We now live in the digital age and as you know metadata is the cure for all problems digital. For digital images there is already a standard for the metadata that goes with the picture.

The idea is very simple. My proposal is that we include metadata with the image about how to crop it and the display device, knowing its aspect ratio, uses that metadata to display the image. The new metadata is a minimum bounding box for display of the image.

The minimum bounding box is the part of the image that must be displayed when it is presented. The maximum bounding box is the picture itself. So when we come to edit a picture we crop the picture to the maximum that can be rescued from the image and we also crop the an interior section of the image that is the part that we really want to see. This inner crop is saved in the metadata.

When we go to display the image, either by printing it on a printer, or showing it on a computer slide show or when rendering a slide show movie, the display device decides how much of the picture to show, always including all of the minimum bounding box and then filling the display out as needed with rest of the image to fill the frame. If there is no image to show the display device uses its default, which is black bars for screen and blank (white) for printing.

The display device also knows whether it can rotate the image for display. When an image has a minimum bounding box that is taller that it is wide, a printer can rotate the image by 90 degrees, while a computer display or movie render cannot rotate the picture.

This works for still images because there is metadata that goes with each image. For video, there is metadata for the whole movie, however there is no metadata for each frame or shot. If we add the metadata for each shot in a movie, we can create video that can be shown on any display, 16x9, 4x3, or whatever and still look correct.

Wednesday, October 19, 2005

@#$% Threads

There are many different paradigms for concurrent programming and threads is the worst. We are working on a sophisticate threaded system. It uses supposedly standard technology, pthreads on Linux. The number and variety of problems is quite astonishing. Recently we have had some problems with thread termination.

One "little" problem is that part of the system links with a well known database management system. The database installs a thread exit handler which clean up whenever a thread exits. The problem is that not all out threads access the database system. When we cancel a thread that has not run any database code the database exit handler is called and because it has not been initialized, it SEGVs and brings down the whole system.

Another "little" problem is that I moved our code to a new system and could reliably get it to spin forever in the pthread_join procedure. As this comes just after a call to pthread_cancel, the conclusion is that the pthreads library on that system is not thread safe. The pthreads library is in the same location as it is on the system that works, however it is not exactly the same, which suggests that we have a duff pthreads library. After spending a couple of hours, I could not find out where the thread library on either system came from.

Neither of the problems is really a problem with threads, however they are typical of what a threads programmer has to deal with day to day. I have much to unload on real threads problems that we can look at other times. This is important topic as concurrent programming is the way of the future.