Monday, March 29, 2010

How Data is Changing the Study of Economics

Andrew Leonard in How The World Works recently posted on how computers and the availability of data is changing the study of Economics, and I have to agree. There are a number of forces that are converging to make this happen right now.

Open Government initiatives are making more data available and the internet makes it easier to get the data. Emerging movements like the Open Data Commons emulate the Open Source movement that has made software more available. The Open Data movements are concerned to not only make the data more openly available but to make the data better by providing tools to manage it and inspection so that problems with the data can be corrected.

Web sites to make data available have been around for some time. For example, Numbary.com exists to make public data more available. Sites like Many Eyes and Swivel allow the user to upload data sets and analyze them. You do not need to find your own data sets because you can go to these sites and play around with data sets that others have uploaded.

Several popular books have shown us what can be done. The best known example is Freakonomics, which takes a number of interesting data sets and shows us how they can be analyzed to tell interesting and sometimes quite startling stories. Less flamboyant and more educational is Super Crunchers, subtitled "why thinking-by-numbers is the new way to be smart".

Leonard suggests that the rise of large scale data analysis will displace the old guard who sit in their ivory towered and built model. I have to disagree. The economic model is the explanation of what is happening, the result of analysis. Building a model to explain some aspect of the data or behavior that is brought to light by the data is the result of Analytics. More and better data means that the models will be better, more definitive and most importantly in a fractious discipline, more defensible.

Wednesday, March 24, 2010

The iPad App Conundrum

While the iPad looks like it is going to be successful, I think that there is a question over how the market for iPad Apps will develop. Apps for the iPhone are a stunning success that caught many by surprise. I recall a post in TechCrunch when the iPhone App-store was about to be introduced. Using numbers that were probably leaked from Apple, the column predicted a substantial and valuable market for iPhone apps. The commenters were full of scorn, suggesting that the idea of anyone paying money for little apps was ridiculous. Apple showed us just how it could be done.

On the other hand we have the iPad. Apps for the iPhone make sense because of its small screen. Each app makes the best use of the limited screen space for its own dedicated purpose. The iPad has a much larger screen where the browser with scripting and plug-ins can support most of the experience. Therefore the need for specialized apps is less compelling.

There will still be iPad apps. For example I expect games to do well. However, media companies hoping to monetize their content through subscriptions have a tricky tightrope to walk. Most media companies now make their content available for free on the internet supported by advertising. They cannot afford to withdraw this content completely, but on the other hand if they want to monetize through the iPad, they have to provide a value added experience through their app if they expect to get people to pay. I look forward with interest to see how this all plays out.

Update: TechCrunch just posted on how magazines might work this issue with video on the cover.

Saturday, March 20, 2010

Emerging Languages Face Off

New programming languages are popping up all over the place. In March the SDForum Emerging Tech SIG held an "Emerging Languages Face Off" to try and make sense out of what is going on. The new languages represented at the meeting were Clojure, Scala and Go, with Ruby as a more established control language. The panel was moderated by Steve Mezak, author of "Software without Borders" and CEO of Accelerance, Inc.

The night kicked off with Amit Rathore, Chief Software Architect at Runa, Inc. speaking for Clojure (pronounced closure). He told us that Clojure is a Lisp that runs on the Java Virtual Machine. Lisps are dynamic, functional languages with automatic garbage collection that have been around since the early 60's. Although Amit told us that Clojure programs contain less parenthesis than Java, the examples he showed us did not seem to bear this out. Clojure does try to control the amount of brackets by using both round and square ones. Apart from list, Clojure provides support for both common data structures like Map and lazy sequences.

More importantly, Amit introduced what turned out to be a major theme of the evening, support for concurrency. Clojure has a surprisingly sophisticated (read complicated) support for concurrency. The basic idea is that reads are versioned to be lock free while writes are managed to ensure that they overlap properly. Existing data is immutable, updates are made by writing the new data to new locations. Access to shared memory is delimited by transactions that correspond to the program block structure (good). There are 4 ways of referencing data in a transaction that allow the different use cases each to be handled efficiently. If a transaction fails, it is automatically retried until it succeeds. Their implementation of concurrency goes under the banner of Software Transactional Memory.

While I will give Clojure two and a half cheers for trying, I am not a great fan of Lisp like languages. Amit touched on one of my bugaboos, the ability to change the meaning of the language by writing code. In my mind, this makes Lisp a low level language as any program requires close reading to discover what it might do. Also, my only practical experience of Lisp is Emacs configuration, a scary mess of global variables and functions.

Next up, Evan Phoenix, lead developer of Rubinius spoke about Ruby. Ruby is a dynamic language with automatic garbage collection and is built on the principal of least surprise. While there are several implementations of the language, these implementations have not provided the best performance, so Rubinius is working on a high performance implementation, where more of the implementation is in Ruby itself. The genesis of Ruby was with Lisp and Smalltalk, although the actual language went in a very different direction than these two languages.

Evan admitted that Ruby does not have great support for concurrency. Ruby will work with green threads, that is cooperative multiprocessing that can exploit a single core. The problem of the "interpreter lock" means that a native threads support is not an immediate goal.

After Evan, David Pollak, author of "Beginning Scala" and Benevolent Dictator for Life of the Lift Web Framework spoke on Scala. Scala is a hybrid object oriented/functional language with static typing and garbage collection that runs on the Java Virtual Machine. In some ways it is like Java with less words, the type inference system eliminates the need to explicitly specify data types most of the time. The goal of Scala is to achieve the speed (and safety) of Java with the conciseness of Ruby.

Scala has no specific built in support for concurrency, however the Actors paradigm has been successfully implemented on top of Scala. I have written about both Scala and Actors with Scala previously, so I will say no more here. It is worth noting that both Clojure and Scala get full native threads support and many other benefits from running on the Java Virtual Machine.

Finally Robert Griesemer spoke about the new Go language. Robert is a member of the team developing Go at Google. Go is a statically typed language with automatic garbage collection that compiles down to native hardware. It is a system programming language with control over memory layout of the data. Robert listed the problems with current system programming languages. They are are verbose and repetitious, the data type system gets in the way, build time is slow, particularly compared to dynamic languages, and managing dependencies between modules is difficult. Go aims to be a simple and powerful language with fast tools.

Go does not have inheritance or type hierarchies and is not object oriented, rather it aims to be more flexible. I was somewhat disturbed by this. Although type hierarchies can be misused, they are useful for helping to organizing large projects. On the other hand, for concurrency, Go offers lightweight processes that communicate via channels, which is a welcome move away from the threads paradigm with all its problems.

The Go language is not quite complete yet. The language designers are still working on providing support for a number of features including generics, operators and exceptions. Robert told us that he expects the language to be complete and mature in 6 months to a year from now. Programming language design is not easy and needs to proceed at its own pace. It is worth remembering that the C++ language spend about 15 years in the state of being almost but not quite finished. We will have to wait and see how long it takes Go to mature.

Overall there were two themes that emerge from the the new languages presented at the meeting. One is the desire to make programming simpler and more approachable. It is easier to start writing a program in a dynamic language. Both Scala and Go are statically typed languages with the goal of making the programming experience more like writing a program in a dynamic language. The other theme is the need for new programming languages to provide better support for concurrency. In particular language need to support something that is safer and more controlled than threads.