Build and Break

Thursday, June 12, 2008

Flex, ActionScript, MXML?

It has been over a week since James Ward and Chet Haase of Adobe gave a talk on Flex to the SDForum Java SIG, and I am still trying to get my mind around what it all means. Adobe has a bunch of technologies and products in the Rich Internet Application (RIA) area, but it is difficult to work out what they are, how they fit together and which one I should use for any particular application. Here is the story as I understand it.

Lets start with the programming language ActionScript. ActionScript is ECMAScript which is JavaScript. There are differences in implementation between ActionScript and other forms of JavaScript, however most of the difference is in the Document model, which can be called the API , but is more like the object environment in which the program executes. JavaScript programs execute in a web page defined by the Document Object Model (DOM). ActionScript started as the language of the Flash player so it is more oriented to construction an environment and this leads to some differences in the objects that it can use.

MXML is an XML based declarative language that compiles into ActionScript. Basically it is a shorthand for defining the static parts of an ActionScript environment. By the way, when I entered MXML into the Adobe site search engine, the first thing that came back was the question "Do you mean MSXML?", where MSXML is a MicroSoft technology.

Next we come to the runtime environments. The Flash player is lightweight client that executes compiled ActionScript and is most commonly deployed as a browser plug in (as opposed to, for example, a Browser which contains a JavaScript interpreter). Adobe AIR is a larger and more capable stand alone client for executing compiled ActionScript as well as HTML, Java etc. (as opposed to, for example, a Browser which is a client for interpreting HTML, JavaScript, et al).

Flex is the framework which means that it is a overarching name for the whole pile of technology. The one piece of technology called Flex is Flex Builder, the Eclipse based development environment for ActionScript and MXML. As they have done with other products, Adobe has open sourced a lot of technology surrounding Flex to bring more developers to the platform.

Overall, I am not sure which is more impressive, the melange of technology in Adobe Flex or the marketing effort that tries to make the whole melange of technology seem like one coherent whole.

Monday, June 09, 2008

New iPhone - New Business Model

Steve Jobs announced the widely anticipated new iPhone at the WWDC today. I have seen a lot of comments on features and price, but nothing interesting on the new business model. Here is my take.

In the old business model, Apple and ATT sold the iPhone at full price and in a highly unusual arrangement, ATT shared its ongoing revenue with Apple. Now Apple and ATT sell the iPhone at a discount. ATT presumable pays Apple for each phone they sell, however there is no ongoing revenue sharing. We will have to see exactly how this plays out when the iPhone goes on sale. It may well be that you have to sign up with ATT to unlock the phone when you register it.

Apple still has a couple of revenue streams which are unusual concessions from a mobile-phone companies, especially in the USA. First, Apple gets to sell all the media and games on the phone through its iTunes store. Songs are still $1, movies and TV shows range from $2 to $5, games and applications range from free to $10. This is a useful revenue stream even although it has a margin of only 20% to 30%.

More interesting is the MobileMe storage and syncing service that costs $100 a year. Verizon charges me $10 to move my phone list from an old phone to a new phone when I have to buy a new one. Nobody there or at any other phone company thought of charging $100 for making this service continuously available. At the same time it is a great idea, that many have picked up as a good reason to get the iPhone.

The only problem with MobileMe is the ridiculously small storage capacity of 20 GB. The phone has 8GB or 16GB. What is the point of having a backing store that is about the same size as my phone? Particularly as storage is not that expensive these days. Google Apps offers 25GB for $50 per year, Apple ought to offer something equivalent.

Apart from that, the new business model keeps Apple ahead of the game which is exactly where it needs to be.

Monday, June 02, 2008

Still HDTV - Not

It is old news to me but HDTV has still to turn on viewers. While working out at the gym the other day and listening to Harry Shearer's Le Show, he mentioned recent research conducted by the Scripps Network that large numbers of viewers that receive a HDTV feed continue to watch the same content in standard definition.

I posted about this problem a couple of years ago. Since then I have successfully trained my family to watch HDTV when it is available, but it took some work to make them HD aware. Still, we do not get a lot of channels in HD and the channels are still in the same obscure 700 range of the "dial".

There is some good programming in HD. Last night we watched 2001: A Space Odyssey on the Universal HD channel and it was stunning. Fortunately there were only a few commercial breaks, because the commercials came in so much louder than the film we had to mute the entire commercial break to remain sane.

Saturday, May 31, 2008

Software Tools Used by Criminals

I am always throughly paranoid by the time I leave a meeting of the SDForum Security SIG, and the May meeting was no exception. After the meeting, I stopped at a gas station. Immediately I was suspicious as their price was several cents less that the gas station on the other side of the intersection. Next the pump did not ask me for my ZIP code, and finally the pump let me get more gas than my credit card limit. What kind of scam was this?

The source of my paranoia was the fascinating presentation on "Software Tools Used by Criminals" by Markus Jakobsson, a Principal Scientist at PARC. Marcus led us through the history of software and internet scams, starting with the first computer virus and internet worm, to the present day where sophisticated criminals are making targeted attacks on individuals and businesses.

Marcus also led us through the crime cycle, starting with 'data mining' of public data sources to get the information needed to make an attack, through ways in which the criminals can get money from a scam without identifying themselves. He also described the results of several experiments that he and others had done to measure how easy some of these data mining exercises were, and experiments to measure how gullible people are when they are set up in the right psychological way for a scam.

Finally Marcus came to his recent work on a password reset system. This is usually done with a security question, the answer to which can often be guessed or found out. For example, data mining of public data from Texas had discovered the mothers maiden name of about half the population of Texas. Markus and his team propose a new technique based on preferences which is both easy to remember and is unlikely to be guessed by outsiders.

Monday, May 26, 2008

Pull Dressed as Push

We have been watching with some degree of schadenfreude the problems that Twitter, the incredibly popular microblogging service, has with scaling, or even providing a reliable service. Yesterday Steve Gillmor suggested in his TechCrunch post that the problem has been caused by FriendFeed. FriendFeed is a new social service aggregator that either enhances or engulfs Twitter depending on your point of view. Here is my take on what the problem is.

First some background. Publish and Subscribe (pub/sub) is the underlying goal of all these services. I, as a client subscribe to something, and when the publisher has something that matches my subscription, they Push it to me. This is efficient because stuff is only sent to me when it exists. The problem is that the publisher may not know where I am when they want to do the Push. So many pub/sub systems work on the Pull by Polling model. That is, every so often I ask the publisher if they have anything new for me. The Polling part is that I repeatedly ask for new stuff and the Pull is that when new stuff exists, I Pull it from the publisher. This works reasonable well as long as I do not poll the publisher too often.

For example, RSS works this way (as I discussed some time ago). To prevent the original publisher being overwhelmed by requests for new information, part of the RSS protocol describes how often someone may poll the publisher and not overwhelm the publisher with too many requests.

Now back to Twitter and FriendFeed. Twitter provides an API so that other services can be built upon it. FriendFeed is an aggregator of social networking services that uses the Twitter API to aggregate information for its users. The Twitter API is based on XMPP, which is a high performance API for instant messaging that supports, for example, instant messaging between large service providers such as AIM and Yahoo Instant Messaging. However XMPP also has a low performance option based on HTTP for polling XMPP servers. This turns XMPP into Pull dressed as Push, which strains the servers when the poll rate is too high.

It turns out that the Twitter XMPP API is based on the low performance HTTP option. Thus FriendFeed is polling Twitter for each of its users, and polling frequently to give the appearance of instant response, which may be the reason that the Twitter servers are overloaded. Twitter has a feature in their API to throttle polling to no more than once a minute, however this could also be a problem if it is badly implemented.

By way of disclosure, I do not use Twitter or many of these other toys. I get quite enough information overload from RSS.

Sunday, May 18, 2008

Blog Ennui

It is Sunday morning and I notice that TechCrunch has a couple of new posts. One is a stream of consciousness piece from Steve Gillmor called "Bill's Gold Watch". This one was better than the stream of consciousness piece Steve wrote last week called "The Blood Brain Barrier", mainly because it was shorter. Steve can write conventional blog posts, for example on Saturday morning he had an excellent piece called "Facebook's Glass Jaw" which comments on the Facebook - Friends Connect fracas.

So what is Steve trying to do with "Bill's Gold Watch"? Is he trying to create a new Journalism? To me, it reads more like poetry. Even if it does not make complete sense, it sparks off thoughts and associations, and that appears to be the intention. Another commenter suggested it reads like rap. If it were printed as blank verse we would see what was going on and Steve could concentrate in getting the rhythms right as well as avoiding some of the more tortuous and interconnected thoughts.

In other blog thoughts, I have completely given up on Vallywag. Shortly after I wrote about Vallywag a year and a half ago, the then editor Nick Douglas departed and it has been downhill ever since. Now it is just a load of social claptrap of the sort that fills the gossip column of a tabloid newspaper.

Saturday, April 26, 2008

Hypertable - A Massively Parallel Database System

Now everyone can have their own database system that scales to thousands of processors, as we heard at the April meeting of the SDForum Software Architecture and Modeling SIG. Doug Judd from zEvents, and the Hypertable Lead Developer spoke on "Architecting Hypertable-a massively parallel high performance database".

Hypertable is an Open Source database system designed to deal with the massive scale of data that is found in web applications such as processing the data returned by web crawlers as they crawl the entire internet. It is also designed to run on the massive commodity computer farms, which can consist of thousands of systems, that are employed to process such data. In particular Hypertable is designed so that its performance will scale with the number of computers used and to handle the unreliability problems that inevitably ensue from using large computer arrays.

From a user perspective, the data model has a database that contains tables. Each table consists of a set of rows. Each row has a primary key value and a set of columns. Each column contains a set of key value pairs commonly known as a map. A timestamp is associated with each key value pair. The number of columns in a table is limited to 256, otherwise there are no tight constraints on the size of keys or values. The only query method is a table scan. Tables are stored in primary key order, so a query easily accesses a row or group of rows by constraining on the row key. The query can specify which columns are returned, and the time range for key value pairs in each column.

The basic unit for inserting data is the key value pair, along with its row key and column. An insert will create a new row if none exist with that row key. More likely, an insert will add a new key value pair to an existing column map or have the existing value superseded if the new column key already exists in the column map.

As Doug explained, Hypertable is neither relational or transactional. Its purpose is to store vast amounts of structured data and make that data easily available. For example, while Hypertable does have logging to ensure that information does not get lost, it does not support transactions whose purpose is to make sure that multiple related changes either all happen together or none of them happen. Interestingly, many database systems switch off transactional behavior for large bulk loads. There is no mechanism for combining data from different tables as tables are expected to be so large that there is little point in trying to combine them.

The current status is that Hypertable is in alpha release. The code is there and works as Doug showed us in a demonstration, however it uses a distributed file system like Hadoop to store its data and while they are still developing they are also waiting for Hadoop to implement a consistency feature before they declare beta. Even then there are a number of places where they have a single point of failure, so there is plenty of work to make it a complete and resilient system.

Hypertable is closely modeled on Google Bigtable. At several times in the presentation when asked about a feature, Doug explained it as something that Bigtable does. At one point he even went so far as to say "if it is good enough for Google, then it is good enough for us".

Monday, April 21, 2008

SaaS, Cloud, Web 2.0... it’s time for Business Intelligence to evolve!

The most surprising phrase in Roman Bukary's presentation to the April meeting of the SDForum Business Intelligence SIG was "right time, not real time", and it was said more than once. Roman is Vice President of Marketing and Business Development at Truviso and his presentation entitled "SaaS, Cloud, Web 2.0... it’s time for Business Intelligence to Evolve!" brought a large audience to our new location at the SAP Labs on Hillview Avenue in Palo Alto.

Truviso provides software to continuously analyze huge volumes of data, enabling instant visibility, immediate action and more profitable decision making. In other words, their product is a streaming database system.

Over the years, the Business Intelligence SIG has heard about several streaming database systems. Truviso distinguishes themselves in a number of ways. Firstly it leverages the open source Postgres database system, so it is a real database system and real SQL. Other desirable characteristics are handling large volumes of data, large numbers of queries and the ability to change queries on the fly. They also have a graphics front end that can draw good looking charts. Roman showed us several Truviso applications including stock and currency trading applications that are both high volume and a rapidly changing environment.

Then we come to the "right time, not real time" phrase. In the past I have associated this phrase with business intelligence systems that could not present the data in a timely manner. Obviously, that is not a problem with streaming database systems that process and aggregate data on the fly and always have the most up to date information.

I think that Roman was trying to go in the other direction. He was suggesting that Truviso is not only useful for high pressure real time applications like stock trading, it also has a place in other applications where time is less pressing but the volume of data is high and there is still a need for a real time view of the current state. Such applications could include RFID, logistics and inventory management.

Tuesday, April 08, 2008

Open Source 10 Years Later

April 7 2008 marks 10 years since the landmark Freeware Summit that signaled the opening of the Open Source movement. By coincidence I recently read the manifesto of the Open Source movement, "The Cathedral and The Bazaar" by Eric S Raymond. The book, published in 1999 and revised in 2001, contains the namesake essay several others including "Revenge of the Hackers" which describes the events leading up to and following the Freeware Summit from an insiders point of view. The essay is valuable as a history of Open Source however its veracity is slightly marred because it dates summit meeting as happening on March 7.

One thing that the Revenge of the Hackers does not shy away from is explaining why Richard Stallman and the Free Software Foundation was not present at the Freeware Summit. In the past I have written on the distinction between Open Source and Free Software. Raymond is tactful but firm in explaining why creating a separation between these two ideas was essential to getting Open Source accepted by the mainstream.

On the other hand, the end of the essay that looks into the future of Open Source does suffer in hindsight. Open Source has advanced by leaps and bounds in the last 10 years. However it is still not in the position of ruling the world as the Revenge of the Hackers suggests it might. Lets give it at least another 10 years.

Thursday, April 03, 2008

An Evening with The Difference Engine

One day I write about Doron Swade and "The Cogwheel Brain". Three days later I get an invitation from the Computer History Museum. Doron Swade is coming to Silicon Valley with a Difference Engine!

The occasion is that another Difference Engine has been commissioned by Nathan Myhrvold, ex CTO of Microsoft. It is being exhibited at the Computer History Museum in Mountain View, and to celebrate its arrival, there is an "Evening with Nathan Myhrvold and Doron Swade" at the museum. We have signed up for the event, have you?