Build and Break

Sunday, April 04, 2010

Web Analytics 2.0

Web analytics is changing fast as we discovered at the March meeting of the SDForum Business Intelligence SIG. Avinash Kaushik, Analytics Evangelist for Google, spoke on 'Web Analytics 2.0: Rethinking Decision Making in a "2.0" World'. Avinash started off by telling us how he became well known as a web analytics guru. A few years ago he started writing his blog "Occam's Razor". It soon gathered a large readership and a publisher approached him to write a book. His first book "Web Analytics: An Hour a Day" was a distillation of his blog posts. The book is a best seller even although much of its contents is available for free on the web. His second book "Web Analytics 2.0" came out recently.

Avinash is an excellent communicator with a strong personal style. One aspect of that style, quite obvious from his blog posts, is the urge to create lists of ideas. For his presentation, Avinash offered us a list of simple ideas on web sites metrics and analytics. Here are some of the ideas that he presented.

The first idea is simple and direct - Don't Suck. The suckage of a web page can be measured by a metric called Bounce. This is a relatively new metric that we had not previously heard discussed at the Business Intelligence SIG. Bounce measures the users whose experience of the web page and site is, as Avinash put it "I came. I puked. I left." and he showed us some pretty pukey pages that might back up this behavior. A typical analysis is to look at the pages with the highest bounce rate, determine why they cause that behavior and what can be done about it.

His next idea is Segment or Die. Analytics is about aggregating data to make sense of large datasets, however over-aggregation results in a single number and nothing to compare it with. Segmenting the data gives us a number of data items that we can compare. Avinash showed us a simple example where he took a hospital web site and classified the content into 8 segments and then compared the amount of content against the number of page views in each segment. It was immediately apparent where the effort should go into adding and improving content.

Analyzing your web logs only tells a part of the story, you also have to worry about what the analytics cannot tell you. Ex Defense Secretary Donald Rumsfeld is infamous for having talked about "the known knowns, the known unknowns, and the unknown unknowns". The unknown unknowns are the things that you don't even know that you don't know and therefore the thing you should be most worried about. You can start to get a handle on what you do not know by looking at your performance relative to your competitors. This is known as Benchmarking, or in the case of a deep study as Competitive Intelligence. For an example of what can be done, a recent post on the Occam's Razor blog discusses 8 sources for Competitive Intelligence data.

Most web site analytics only looks for the one big conversion from a web site, however there are many other small conversions that are tracked and worth evaluating because there may be hidden value lurking in the long tail. For example, recently Avinash wanted to know what his blog was worth so that he could defend taking time away from the family to write it. After determining a value for each reader, he started adding up all the other micro-conversion like people who subscribe to the RSS feed and advertisements for his books and the non-profit organizations that he supports. Overall he came up with a figure of about $26000 per month. Now Avinash does not make a penny from his blog, so this is notional money that adds to his personal brand value, but that value seems to make the effort of writing the blog well worth the time spent.

The next idea is Fail Faster. By this Avinash means do lots of different experiment, many of which will fail, to find out what works. He led us through an example from the Obama Presidential campaign. President Obama raised huge amounts of money from many small donations on his web site. The initial web page worked well. The experiments were to try some variations on the theme. Pages with video, and stirring video at that, did very badly. A simple picture of Obama with his family did a little better than the initial picture, so that one was chosen.

Avinash showed us this example to make a number of points. The Obama analytics team was tiny. Often the best work is done by a small agile team that has the freedom to experiment. The team used free tools. Avinash believes that that good people are much more important than good tools. His suggestion for dividing up the analytics budget is to spend 90% on people and 10% on tools. Sometimes, a web site design feature starts from a HiPPO (Highest Paid Persons Opinion), which can be destructively bad, and difficult to get around because in all organizations the highest paid persons opinion is taken very seriously. The best way to counter a HiPPO is to show that other ideas work better through the results of experiments that produce hard evidence.

While some may think that web analytics is a mostly solved problem, Avinash believes we are just starting to figure out what can be done, and that there is plenty of room for more innovation. I will continue to read Occam's Razor to find out where he takes us next.

Monday, March 29, 2010

How Data is Changing the Study of Economics

Andrew Leonard in How The World Works recently posted on how computers and the availability of data is changing the study of Economics, and I have to agree. There are a number of forces that are converging to make this happen right now.

Open Government initiatives are making more data available and the internet makes it easier to get the data. Emerging movements like the Open Data Commons emulate the Open Source movement that has made software more available. The Open Data movements are concerned to not only make the data more openly available but to make the data better by providing tools to manage it and inspection so that problems with the data can be corrected.

Web sites to make data available have been around for some time. For example, Numbary.com exists to make public data more available. Sites like Many Eyes and Swivel allow the user to upload data sets and analyze them. You do not need to find your own data sets because you can go to these sites and play around with data sets that others have uploaded.

Several popular books have shown us what can be done. The best known example is Freakonomics, which takes a number of interesting data sets and shows us how they can be analyzed to tell interesting and sometimes quite startling stories. Less flamboyant and more educational is Super Crunchers, subtitled "why thinking-by-numbers is the new way to be smart".

Leonard suggests that the rise of large scale data analysis will displace the old guard who sit in their ivory towered and built model. I have to disagree. The economic model is the explanation of what is happening, the result of analysis. Building a model to explain some aspect of the data or behavior that is brought to light by the data is the result of Analytics. More and better data means that the models will be better, more definitive and most importantly in a fractious discipline, more defensible.

Wednesday, March 24, 2010

The iPad App Conundrum

While the iPad looks like it is going to be successful, I think that there is a question over how the market for iPad Apps will develop. Apps for the iPhone are a stunning success that caught many by surprise. I recall a post in TechCrunch when the iPhone App-store was about to be introduced. Using numbers that were probably leaked from Apple, the column predicted a substantial and valuable market for iPhone apps. The commenters were full of scorn, suggesting that the idea of anyone paying money for little apps was ridiculous. Apple showed us just how it could be done.

On the other hand we have the iPad. Apps for the iPhone make sense because of its small screen. Each app makes the best use of the limited screen space for its own dedicated purpose. The iPad has a much larger screen where the browser with scripting and plug-ins can support most of the experience. Therefore the need for specialized apps is less compelling.

There will still be iPad apps. For example I expect games to do well. However, media companies hoping to monetize their content through subscriptions have a tricky tightrope to walk. Most media companies now make their content available for free on the internet supported by advertising. They cannot afford to withdraw this content completely, but on the other hand if they want to monetize through the iPad, they have to provide a value added experience through their app if they expect to get people to pay. I look forward with interest to see how this all plays out.

Update: TechCrunch just posted on how magazines might work this issue with video on the cover.

Saturday, March 20, 2010

Emerging Languages Face Off

New programming languages are popping up all over the place. In March the SDForum Emerging Tech SIG held an "Emerging Languages Face Off" to try and make sense out of what is going on. The new languages represented at the meeting were Clojure, Scala and Go, with Ruby as a more established control language. The panel was moderated by Steve Mezak, author of "Software without Borders" and CEO of Accelerance, Inc.

The night kicked off with Amit Rathore, Chief Software Architect at Runa, Inc. speaking for Clojure (pronounced closure). He told us that Clojure is a Lisp that runs on the Java Virtual Machine. Lisps are dynamic, functional languages with automatic garbage collection that have been around since the early 60's. Although Amit told us that Clojure programs contain less parenthesis than Java, the examples he showed us did not seem to bear this out. Clojure does try to control the amount of brackets by using both round and square ones. Apart from list, Clojure provides support for both common data structures like Map and lazy sequences.

More importantly, Amit introduced what turned out to be a major theme of the evening, support for concurrency. Clojure has a surprisingly sophisticated (read complicated) support for concurrency. The basic idea is that reads are versioned to be lock free while writes are managed to ensure that they overlap properly. Existing data is immutable, updates are made by writing the new data to new locations. Access to shared memory is delimited by transactions that correspond to the program block structure (good). There are 4 ways of referencing data in a transaction that allow the different use cases each to be handled efficiently. If a transaction fails, it is automatically retried until it succeeds. Their implementation of concurrency goes under the banner of Software Transactional Memory.

While I will give Clojure two and a half cheers for trying, I am not a great fan of Lisp like languages. Amit touched on one of my bugaboos, the ability to change the meaning of the language by writing code. In my mind, this makes Lisp a low level language as any program requires close reading to discover what it might do. Also, my only practical experience of Lisp is Emacs configuration, a scary mess of global variables and functions.

Next up, Evan Phoenix, lead developer of Rubinius spoke about Ruby. Ruby is a dynamic language with automatic garbage collection and is built on the principal of least surprise. While there are several implementations of the language, these implementations have not provided the best performance, so Rubinius is working on a high performance implementation, where more of the implementation is in Ruby itself. The genesis of Ruby was with Lisp and Smalltalk, although the actual language went in a very different direction than these two languages.

Evan admitted that Ruby does not have great support for concurrency. Ruby will work with green threads, that is cooperative multiprocessing that can exploit a single core. The problem of the "interpreter lock" means that a native threads support is not an immediate goal.

After Evan, David Pollak, author of "Beginning Scala" and Benevolent Dictator for Life of the Lift Web Framework spoke on Scala. Scala is a hybrid object oriented/functional language with static typing and garbage collection that runs on the Java Virtual Machine. In some ways it is like Java with less words, the type inference system eliminates the need to explicitly specify data types most of the time. The goal of Scala is to achieve the speed (and safety) of Java with the conciseness of Ruby.

Scala has no specific built in support for concurrency, however the Actors paradigm has been successfully implemented on top of Scala. I have written about both Scala and Actors with Scala previously, so I will say no more here. It is worth noting that both Clojure and Scala get full native threads support and many other benefits from running on the Java Virtual Machine.

Finally Robert Griesemer spoke about the new Go language. Robert is a member of the team developing Go at Google. Go is a statically typed language with automatic garbage collection that compiles down to native hardware. It is a system programming language with control over memory layout of the data. Robert listed the problems with current system programming languages. They are are verbose and repetitious, the data type system gets in the way, build time is slow, particularly compared to dynamic languages, and managing dependencies between modules is difficult. Go aims to be a simple and powerful language with fast tools.

Go does not have inheritance or type hierarchies and is not object oriented, rather it aims to be more flexible. I was somewhat disturbed by this. Although type hierarchies can be misused, they are useful for helping to organizing large projects. On the other hand, for concurrency, Go offers lightweight processes that communicate via channels, which is a welcome move away from the threads paradigm with all its problems.

The Go language is not quite complete yet. The language designers are still working on providing support for a number of features including generics, operators and exceptions. Robert told us that he expects the language to be complete and mature in 6 months to a year from now. Programming language design is not easy and needs to proceed at its own pace. It is worth remembering that the C++ language spend about 15 years in the state of being almost but not quite finished. We will have to wait and see how long it takes Go to mature.

Overall there were two themes that emerge from the the new languages presented at the meeting. One is the desire to make programming simpler and more approachable. It is easier to start writing a program in a dynamic language. Both Scala and Go are statically typed languages with the goal of making the programming experience more like writing a program in a dynamic language. The other theme is the need for new programming languages to provide better support for concurrency. In particular language need to support something that is safer and more controlled than threads.

Saturday, February 27, 2010

Avatar Issues

After complaining about the latest revival in 3D movies, I finally went to see Avatar in glorious IMAX 3D. The movie is a stunning spectacle, well worth the price of admission and even hanging around for half an hour to ensure that we got a good seat in a full cinema a couple of months after the films release. However there are issues.

The first issue is that we had to sit through several trailers for 3D movies that are going to be released in the next few months. Hollywood seems to be more determined to make 3D movies work this time around by building a pipeline of 3D movies for us to go and see. On the other hand, all of the forthcoming 3D movies are animated, meaning that serious people do not need to watch them. To do digital animation properly, the film makers build 3D model of everything in the foreground of a scene, so it is not a huge amount of extra work to throw off a 3D version of an animated movie. Come to think of it, Avatar is mostly an animated movie with slightly different visual aesthetic and much more detail in the models and textures.

The other issues relate to movie making. 3D is a different medium that requires different film making techniques. For example, limited depth of field is beloved technique for "art house" movies. In 3D all parts of the image need to be in focus all the time because you cannot resolve a 3D image that is out of focus, moreover it is likely to give the viewer a headache. The rule for 3D movies is f/64 all the way.

Another issue is the foreground. 3D movies need to be very careful about composing the picture so the foreground does not protrude. There were only a couple of instances in Avatar where the foreground was a problem. The one noticeable incident had foreground leaves in the forest sticking well into the field of view combined with a camera movement that caused the fronds to move rapidly past the eye in a most distracting way.

Depth of field and foreground are only a couple of issued with the language of 3D movies, there are many more need to be considered. One example is the aspect ratio that the movie is made in. It seems that Avatar was made so that it could be viewed in several aspects ratios. All in all I question whether it is possible to make a movie that succeeds in both 2D and 3D.

James Cameron has said that he did not want the 3D effects to distract the viewer from the movie. I still remember the scene in "Flesh for Frankenstein" where the children go into the belfry, bats fly around and one flies out of the screen and into your face. Fortunately, there were no such scenes in Avatar.

Finally there is the political angle. Some people have complained that Avatar is anti-American. For anyone who thinks that the sub-Blackwater corporation mining Pandora, and walking over the natives without so much as a "by your leave", represents the American ideal, I feel very sorry for them.

Sunday, February 14, 2010

Predatory Lending

While the practice of Predatory Lending is difficult to define, it is easy to see the results, people stuck with high priced loans that they cannot get out of. Today I saw a couple of references to an article in the Washington Post that reports more than half of of the mortgages in the US have an interest rate that is greater than 6% while for the last year the mortgage interest rate has been hovering around 5%. This means that over half the homeowners in the US are unable to refinance their mortgages to take advantage of a lower interest rate.

Many people think of predatory lending as something like payday lending to the poor. In practice it happens at all levels of the economy. At the highest level there are the exploits of the "economic hit man" whose job was to enable the selling of economic development loans to poor countries that could ill afford them. Recently, the economic woes of Greece may have been exacerbated by clever derivative swaps from Goldman Sachs, designed to hide the true nature of the debt that it owed.

Predatory housing loans inflamed the housing bubble, sticking the middle classes with high priced loans cleverly disguised with low initial teaser rates. Now the middle classes are stuck with loans that they cannot refinance because their houses are underwater or because they do not have a job, a good enough job or the credit rating for the refinance to go ahead. This is a yet further drag on the economy, already in recession. As the Washington Post article says:

"More refinancing activity would have helped household budgets, but also the national economy because homeowners might have spent some of the extra cash they pocketed, giving the recovery an added lift."

Homeowners do have an option. As Roger Lowenstein writes in the New York Times "Walk Away From Your Mortgage". Part of his argument is that banks are walking away from their mortgage obligations, and the American people should not feel obliged to behave better than the corporations who sold them their home loans in the first place. I would add to that argument that much of the vitality of the US economy comes from labor mobility. Having people stuck in a home that they cannot sell because their loan is underwater and unable to get a job nearby is yet another drag on the economy.

Sunday, January 31, 2010

Two Kinds of Book Readers

To get to the heart of Kindle - iPad issue we need to understand that there are two kinds of books. One kind of book is what I will call "page turners". You start reading the book at page 1 and read sequentially turning the pages until you get to the end. The other type of book I will call "reference". You start reading the book by going to the index or the page of contents, finding the topic you are interested in and proceeding from there. Books such as dictionaries and encyclopedias are organized as indexes to make look up easier.

Of course, nothing is ever completely black and white. There are plenty of books that fall between the two poles of page turners and reference books. For example, a book of poems may be organized to be read as a page turner, however it will also contain an index of poems and perhaps an index of first lines. Many technical books are written as both a page turner and a reference. Perhaps on first encounter you read it cover to cover and then use it as a reference book. The way we read magazines and newspapers is more like a reference book than a page turner. In a magazine we look at the index to find the article we want to read, or flip through the pages to find something eye catching. Scanning a newspaper is a similar act.

The point of all this is that the Kindle book reader is a device that is optimized for reading page turners. Navigation is difficult. While the Kindle in theory can be used to read newspapers and magazines, it has not been very successful in this application. The iPad on the other hand has all the touch based navigation mechanisms and also the advantages of the web like hypertext and search engines.

In summary the Kindle is a gadget for reading a specific type of book and nothing else. As such is is effective and well received by dedicated readers of page turner books, a small but devoted audience. The Kindle is limited by its niche. The iPad is a much more general device that does book reading as one of its many functions. Better navigation and the display means that it can be used to consume all media types, and it is particularly good for navigation heavy media such as newspapers and magazines and reference material.

Thursday, January 28, 2010

iPad Excitement

After weeks of excitement, build up suspense and leaks, Apple finally got around to announcing their large form iPod Touch, called the iPad. Already it is being decried all over the internet for its many deficiencies. In summary, the complaints come from a bunch of geeks who say that the iPad is too deficient and locked down to be the kind of computer system that they can play with, and anyway they already have enough computer systems, so why do they need another one. The simple answer to these complaints is that nobody is forcing them to buy an iPad.

I see the iPad as a lifestyle device. It is the thing I have in my living room so that while watching a movie, I can look it up on IMDB, perhaps to find out who is that familiar looking actor in a cameo role. In the kitchen I can use the iPad to look up a recipe, and in the breakfast nook to skim the morning newspaper headlines over a cup of coffee. Last thing at night, I can use the iPad to read blogs, news magazines or a few pages from a book before going to sleep.

In my opinion, the most interesting thing about the iPad is that Apple is playing to their strength as a systems integrator. The iPad has lightning performance and good battery life because Apple has developed their own processor chip in tune with the software that runs on it. Admittedly the software comes with plenty of restrictions like a lack of multi-tasking. However, as is often the case with Apple, the end result is "less is more".

Wednesday, January 13, 2010

3D - Not

We are hearing big things about 3D. The movie Avatar was recently released in two different 3D systems, and at the CES trade show, several television makers announced that they will have 3D TVs available later this year. Optimism abounds. Someone in the panel for the This Week in Tech episode broadcast from the CES show in Las Vegas said they managed to watch a 3D demo for several minutes before they got a headache.

The real problem is that we have been here before. Every few years there is a new 3D system that is going to change the world of visual media. 3D still pictures are 100 years old. The first 3D movies were released in the 1950's. I saw "Flesh for Frankenstein" in 3D more than 30 years ago. Since then we have had 3 or 4 more 3D hype cycles. The result has always been the same, a lot of huff and puff with no lasting result. I see no reason why 3D should fare any better this time than it has on all the previous occasions.

Thursday, December 31, 2009

Television in Trouble

There are forces at work that are going to completely change the television business in the US. On the one side there are the major television networks who believe that it is their right to earn large sums of money from television, just because they have in the past. On the other side is the consumer who is tired of the cost and increasingly switching off. In the middle are the cable companies and the cable content companies.

The consumers are fed up. Television has become unwatchable as the number and length of the commercial breaks has extended. We used to get 48 to 50 minutes of content in each hour, and now we get just 42 minutes. At that rate a season of 24 has less than 17 hours of content. The only way to watch a TV show is to record it on a DVR and watch later, skipping the commercials. Once we get in the habit of watching TV offline, it becomes much easier to cut the cable completely and just watch the web. Between Netflix, Hulu and YouTube there is quite enough stuff to keep entertained.

Another source of complaint is the constantly rising cost of cable. This is caused by the cable company paying more and more for content. For example, the cable companies pay ESPN $4 per month per viewer to carry the channel, and that fee is rising. Other cable content companies are jumping into the valuable content pool. Ten years ago, the AMC channel used to show very old movies with no commercial breaks, now AMC puts on award winning shows like Mad Men full of commercials. Every cable channel seems to have its must see TV program from the BBC with Top Gear through the the USA network with Burn Notice.

The cost of cable is about to go up sharply as the major TV networks demand commensurate fees for their programming from the cable companies. This does not seem like a winning idea in recessionary times. As fees rise more and more people will cut the cable. Either the cost of cable has to stabilize with cuts to content, or TV risks going the way of radio. (I hear that radio still broadcasts, but I do not listen to it, and nobody that I know still listens.) I think that we will see some big changes coming to TV business over the next year or so.