Monday, November 27, 2006

Understanding Metadata Revisited

Last month I wrote down some ideas on metadata. Looking back they are not very useful. More recently, I got back to reading Ralph Kimball's latest book "The Data Warehouse ETL Toolkit". As Kimball says, metadata is an important component of ETL, however he does not have much more success than I did in producing a definition or a useful explanation and slipped back, as I did, into the less satisfactory definition by example.

This got me thinking. Maybe we could improve the definition of metadata by tightening up the data about data definition. Here is my version "Metadata is structured data that describes structured data". One important attribute of metadata is that it can be used by programs as well as by people, and for this reason metadata must have a known structure. Also as metadata describes data, the data it describes has structure as well, if only because the metadata describes it. Compared to other definitions, this one finds a middle ground, specific enough to have some use while not being confined to a specific application.

Properly understanding Metadata is more than absorbing an 8 word definition. I have already mentioned one important attribute, that metadata can be used by programs as well as people. Another important attribute is the distinction between System and User metadata. System metadata is generated by the system that originates or manages the data. For example, the system metadata in a database is the descriptions of the tables, columns, constraints and almost everything else in the catalog. User metadata is created and mostly for use by other people. In a database catalog, User metadata is the contents of comments on the database objects and any semantics that may be associated with the names of the database objects.

A better example of the distinction is found in an MP3 file. System metadata in an MP3 file is the Bit Rate, Frequency and Stereo or Mono mode. User metadata is the contents of the ID3 tag. The distinction between system and user metadata is important because metadata like any other data can have data quality issues. System metadata is almost always correct. If the system metadata were faulty, the system that generated or used the data would break. On the other hand User metadata is always suspect. Just ask any music fan about ID3 tags.

I am sure that there is a lot more to say about metadata, however this feels like a good starting point.

Friday, November 24, 2006

Yahoo Directions

The latest edition of Wired has an article by Bob Garfield on YouTube and its acquisition by Google. It contains the following great quote: "success is 1 percent inspiration, 99 percent monetization", and this brought me back to a comment on Yahoo by my colleague Dave McClure. Yahoo has been under siege recently for poor performance and even the inmates are revolting.

Yahoo sites are the number one web portal with by far the most page views of any web media company. Dave diagnoses Yahoo's primary problem as an inability to monetize all those eyeballs, and I wholeheartedly agree with him. As I have reported before Yahoo's business model is quite simple, get all the world to sign up for compelling Yahoo services and then monetize them by selling targeted advertising based on knowing the user.

In practice, Google, who seems to be reaching for a similar business model, is much more effective at monetizing a smaller audience that it knows less about. When Google get their full panalopy of services out of beta they could be unstoppable.

Monday, November 06, 2006

Flight Simulator Earth

Today, Microsoft has unveiled their answer to Google Earth, Microsoft Live Search with 3D images of cities that you can navigate through. Now they have 15 cities, and they are expanding to 100 cities by next summer.

I took one look at the pictures and immediately recognized what they have done. Microsoft have taken their venerable Flight Simulator program and repurposed it as a web navigation tool. While it may have some initial glitz, in practice it is not going to be nearly interesting, useful or awesome as Google maps.

The funny thing is that I read about this in an earnest report on TechCrunch. Neither the reporter nor any of the comment on the entry noticed the connection between Microsoft Live Search and Flight Simulator. In fact , as may be expected, only about half of the commenters could make it work. For myself, I am not going to take the risk of using Internet Explorer to navigate the web so I will have to forgo the pleasure of experiencing it first hand.

Disclosure: I have never used Microsoft Flight Simulator. The closest I have come is glancing at a review on a gaming web site some years ago.

Sunday, November 05, 2006

Metcalfe's Law

The July issue of IEEE Spectrum contained an article called "Metcalfe's Law is Wrong". Bob Metcalfe wrote a response to the article as a blog entry in August. The editorial in the November issue of IEEE Spectrum says that the article raised its own little storm of controversy and solicits further comments. Here are my thoughts.

Metcalfe's Law states that the 'value' of a network grows as the square of the number of connection points to the network. This is usually stated as the number of users where each user is assumed to have their own connection point. The law was popularized in 1993 by Gorge Gilder, writer of Telecosm, the Gilder Technology Report and chief cheerleader of the telecom/internet revolution/bubble. Brisco, Odlyzko and Tilly argue in the IEEE Spectrum article that the actual growth in value is n*log(n) and that the original formulation was bad because it directly led to the speculative excess of the telecom/internet bubble.

Put baldly, Metcalfe's Law says that if I have a network and you have a network, and we connect our networks together, they are worth much more than either network on its own, or even the sum of the two networks. The more networks we connect the more valuable the whole thing becomes. So the point of Metcalfe's Law is that there is a huge incentive for all networks to join together into one completely connected internetwork. This has come to pass, first for telephones and then for computers. Thus my position is that Metcalfe has been proven correct and that it is academic to argue whether the 'value' (whatever that means) of the network grows quadratically or exponentially.

We need to understand the context when looking at Metcalfe's and Gilder's arguments. As Bob Metcalfe says in his blog entry, in 1980 when he devised Metcalfe's Law he was just trying to sell the value of networks and create business for his company 3COM. This was at a time when an Ethernet card cost $5000 and flinty eyed accountants would argue to reduce the size of their network buy while he would argue that they should increase it.

George Gilder is the person who foresaw a single interconnected Internet at a time when there was CompuServe, Prodigy, AOL and thousands of local bulletin board systems. All of these were swept away by the internet revolution except for AOL who managed to ride the wave by co-opting it. So Gilder was correct as well, although he was eventually carried away by the force of his own argument like many who listened to him.

Wednesday, November 01, 2006

Vallywag

The thing that I miss most from Web 1.0 is Suck. As far as I know Web 2.0 does not have anything that matches its authority, wit, eclectic pithiness and complete mastery of the cultural reference. In those days, if I took a few minutes after lunch to explore the Internet, I would always turned to Suck.

These days the best I can do is Vallywag. Compared to Suck it is a sprawling parochial mess with an unhealthy obsession for TechCrunch blogger Michael Arrington. If Vallywag has a center of gravity it is closer to Castro Street in Mountain View than San Francisco. However it is a good way of keeping up with what is really going on. For example, today there is: