Sunday, December 27, 2009

Kindle Chronicles

Amazon announced that "On Christmas Day, for the first time ever, customers purchased more Kindle books than physical books." Well duh! If you want a physical book for Christmas, you have to buy it before Christmas day. On the other hand, every one who received a Kindle as a gift used the wireless book download feature to get a book to read on Christmas day. In the very same announcement, Amazon said that "Kindle has become the most gifted item in Amazon's history". Amazon's statement is a nice piece of spin but not a lot more.

More interesting commentary on electronic book readers is found in the Kindle Chronicles blog. In the early days of emusic, musicians generally stood by their record companies. Book authors seem to be a much more independent lot according to the most recent post "What We Have Here Is a Failure To Communicate". The publishers have been trying to preserve their position by keeping the prices of ebooks high, while the authors want to be read and the books that sell most on the Kindle are the cheaper ones. Also authors do not see why the publishers should get such a large share of the revenue when there is no cost to their ebook inventory.

Saturday, December 26, 2009

Die AutoRun Die

Another year has almost passed and I have not yet ranted about an awful, unnecessary and totally annoying feature of Microsoft Windows, so today I am going to tell you why AutoRun should die.

AutoRun is the "feature" where you plug something into your computer and then stuff happens completely out of your control. The thing you plug in might be key drive, camera, iPod or whatever. Last Christmas I won a 4GB SanDisk Cruzer USB key drive as a door prize. When I plugged this horrible little thing into my computer it installed the U3 driver with useless and dangerous functions that I DO NOT WANT! To make matters worse, there is no obvious way to remove the driver or its annoying functionality. To top off the bad behavior, even although I immediately erased the entire contents of the drive, when it was plugged into another computer, it infected that computer with its unwanted drivers as well. I have thrown the key drive away to prevent further damage.

The combination of USB key drives and AutoRun is a serious computer virus infection vector to the extent that key drives are being banned in critical places. However the problem is not just with key drives. I have not disabled AutoRun because I use it two to three times a week to sync my iPod with the latest podcasts. Recently my daughter plugged her iPod into my computer just to recharge the battery. First this caused iTunes to crash, then when I brought it back, it wanted to sync my stuff onto her iPod. My daughter does not want anything of mine on her iPod and I had to jump through hoops to prevent the sync.

The problem is that iTunes and everyone else has totally bought in to the automagic nonsense of AutoRun behavior. A much simpler, safer and easier to use behavior is to have the user plug in a device and then bring up a program to use the device. Unfortunately the designers(?) of Windows decided to emasculate their users and instead give the device the power to decide what it wants to do. The subliminal message from Microsoft is that you are too stupid to operate you own computer so we are going do it for you, or let anyone else who might have more of a clue do it for you. The consequence of this design is that our computers do not belong to us, but to hackers who exploit these "features" as attack vectors to take control of them.

If you sit back and think about it, Autorun is obviously ill conceived. The design center is that a single user is logged into their computer and actively using it. What does AutoRun do when nobody has logged into the computer, what does it do when two users are logged in? In the example that I gave above, my daughter plugged her iPod into my computer when two people were logged in and the screen saver had locked both accounts. Of course iTunes crashed, it did not know what to do.

The iPod and iTunes is particularly annoying because it is unusable without AutoRun. On the iTunes support web site, the top support issue is "iPod doesn't appear in iTunes" and the second issue is "iPhone does not appear in iTunes". However there is no button in iTunes to go and look for an iPod or iPhone, instead they rely on AutoRun with no easy fall back should that fail.

Sunday, December 20, 2009

BI Megatrends: Directions for Business Intelligence in 2010

Every year David Stodder, Research Fellow with Ventana Research and editor-at-large with Intelligent Enterprise writes a column on Business Intelligence Megatrends for the next year. This column looks back at what has happened in the last year and what he expects to happen in the next year. This year David also presented his thoughts to the December meeting of the SDForum Business Intelligence SIG. David talked about many topics, here I will just cover what he said about the big players.

Two years ago there was a huge wave of consolidation in Business Intelligence when the major independent BI vendors were bought up by IBM, SAP and Oracle, who along with Microsoft are the major enterprise software vendors. In the last year SAP has integrated Business Objects with SAP software to the point that SAP is now ready to threaten Oracle.

Consolidation has not finished. In 2009, two important mergers were announced. Firstly IBM bought SPSS to round out its analytics capabilities. This move threatens SAS which is in the same market, however SAS is a larger and more successful company that SPSS, also SAS is a private company which means that it does not necessarily need to respond to the pressures to consolidate.

The other merger is Oracle's offer to buy Sun and the effect that has on Oracle's relationship with HP. HP and Sun are bitter rivals for enterprise hardware, and HP was the launch partner for Oracle Exadata, the high end Oracle database. Now Oracle is pushing Sun hardware with Exadata, leaving HP in the lurch. David pointed out that there are plenty of up and coming companies with scalable database systems for HP to buy up. That list includes Aster Data Systems, GreenPlum, Infobright, ParAccell and Vertica. Expect to see something happen in this area in 2010.

Of the three major database vendors, Microsoft has the weakest offering, despite SQL Server 2008. However Microsoft does have the advantage of the Excel spreadsheet which remains the most used BI reporting tool. A new version of Excel is due in 2010. Also Microsoft is making a determined push in the direction of collaboration tools with SharePoint. As we heard at the BI SIG November meeting, collaboration is an important new direction for enterprise software capabilities.

Thursday, December 17, 2009

A Systematic and Platform Independent Approach to Time and Synchronization

Managing time and synchronization in any software is complicated. Leon Starr, a leading proponent of building executable models in UML, talked about the issues of modeling time and synchronization to the December meeting of the SDForum SAM SIG. Leon has spoken to the SAM SIG previously on executable models. This time he brought along two partners to demonstrate how the the modeling technique can be applied to a broad range of problems.

Leon started the meeting by talking through five rules for handling time and synchronization. The first and most important rule is that there is no global clock. This models real systems which may consist of many independent entities and allows for the most flexible implementation of the model on a distributed system. In practice, other rules are a consequence of this first rule.

The next rule is that that the duration of a step is unknown. The rule does not imply that any step can take forever, its purpose is to say that you cannot make assumptions about how long a step may take. In particular, you cannot expect independent steps in the model to somehow interleave themselves in some magical way. The third rule is that busy objects are never interrupted. This forces the modeller to create a responsive system by building it from many small steps so that an object is always available to handle whatever conditions that it needs to handle.

The fourth rule is that signals are never lost. This is an interesting rule as it gets to an issue at the heart of building asynchronous systems. The rule implies that there is a handshake between sender and receiver. If the receiver is not ready, the sender may be held up waiting to deliver the signal. Perhaps the signal can be queued, but then there is the problem that the queue is not big enough to handle all the queued signals. In the end you have to build a system that can naturally handle all the events thrown at it, if it is a safety critical system, or that fails gracefully if it is not.

The fifth rule is that there is no implicit order in the system, except that if one object sends signals to another object, the signals arrive in the order that they were sent. Note that I may have interpolated some of my own experience into this discussion of the rules. If you want to explore further watch this video on You-Tube and go to Leon's web site which leads to many interesting papers and discussions.

Next at the meeting, Leland Starr, younger brother of Leon, talked about a web application that he had been the lead on for his employer, TD Ameritrade. The online application is for arranging participants in online webinars. By using the UML modelling technique, he created a model that could be both used to explain how the system would worked to the business sponsors of the project and that could be executed to check that it worked as expected. Leland has a SourceForge project for his work.

Finally Andrew Mangogna talked about a very different class of applications. He builds software to control implanted medical devices like heart pacemakers. The two overriding concerns are that the medical device performs its function safely and that it runs for at least 5 years on a single battery charge. Compared to many of the applications that we hear about at the SAM SIG the implantable device applications feel like a throwback to an earlier and simpler age of computing. The applications are written in the C programming language and the code typically occupy 3 to 4 kilobytes. The program data is statically allocated and an application can use from 150 bytes to 500 bytes. Andrew also has a project on SourceForge for his work.

Friday, December 04, 2009

Bandwidth Hogging

There are several discussions going on around the web about bandwidth hogging started by a post from Benoit Felten in the fiberevolution blog. I wrote about this issue last month in my post on net neutrality. The basic problem is that when the internet becomes congested the person who has created the most connections wins. Congestion can happen anywhere from your local head end through to a backbone and the backbone interconnects. Felten claims that there is no problem, and given the data, he is willing to do the data crunching to prove it, while others disagree.

The problem is a classic Tragedy of the Commons. There is a shared resource, the internet, and some people use more of it than others. That is fine provided that they do not interfere with each other and there is enough resource to go around. As I explained, the problem is that when there are not enough resources to go around, the people who win are the people who create a large number of connections, and these tend to be the people who use the most bandwidth. The point of a torrent client creating a large number of connections is to ensure that that the client gets its "share" of the net whether there is congestion or not. The only viable response is for everyone else to create large numbers of connections to do whatever they want to do, be it download a web page or make a internet phone call. This is undesirable because it can only lead to more congestion and less efficient use of the shared resource.

There are two parts to a solution. Firstly, the internet service providers have to keep adding more equipment to reduce congestion as internet usage grows. Everything would be fine if there were no congestion. Secondly, we need better algorithms to manage congestion. Penalizing people for using the bandwidth they were sold is not the answer, particularly when that is not the real problem. I have suggested that we should look towards limiting connections. Another thought is to kill the connections of the users with the largest numbers of connections to reduce congestion. Again, I am sure that this will have some unintended consequences.

The real problem is that unless we can all agree to be good internet citizens and get along, the forces against Net Neutrality may win. Then large companies with deeply vested interests will get to decide who has priority. The recently announced merger of Comcast, a large Internet Service Provider and NBC, a large content provider is exactly the sort of thing that we need to be wary of.

Monday, November 30, 2009

Consumerization of IT

A new generation is entering the workforce and they are just not going to take it any more. Brian Gentile, CEO of Jaspersoft, did not say these exact words, but it conveys the intent of the introduction to his talk on "Consumerization of IT" at the November meeting of the SDForum Business Intelligence SIG.

Brian was talking about Generation Y, the first generation to have grown up with computers and instant communication to the extent that they take them for granted. More that that they have expectations about these tools and what they can do with them. Unfortunately, enterprise software has often created systems that are slow, ugly and so difficult to use that it can requires weeks of training. While previous generations have put up with difficult software because they know no better, Gen Y does know that it can be better and is not going to put up with software that does not match up.

Brian identified 4 characteristics that Business Intelligence, or any enterprise software must provide to meet the next generations expectations. They are:
  • Elegant presentation.
  • Easy access to data.
  • Extensive Customization.
  • Built In Collaboration.
To do collaboration properly, software applications must fit into a collaboration platform rather than have each application provide its own silo'ed collaboration mechanism.

While I have heard people argue that current Business Intelligence software does not provide a good user experience, Brian put a positive light on this trend, as if the change is for the good and the right thing to do. He is certainly positioning JasperSoft to provide these features and meet the requirements of the next generation.

Brian ended with another optimistic note. The cost of Information Technology is coming down with cheaper hardware and Open Source software. CIO's can direct the money they save to new innovative projects. A good example of this movement is Ingres talking about "The New Economics of IT" as they have been doing for some time.

Sunday, November 15, 2009

Fight Instutional Corruption

Many people think of Lawrence Lessig as a radical with an anti-IPR (Intellectual Property Rights) agenda. In practice he is no radical, in fact his mission is to find a defensible middle ground between the Intellectual Property right and the Free Culture left. One of my first blog posts discussed his talk: "The Comedy of the Commons".

Recently, I have been following his podcasts which discuss his work on copyright as well as his newer work on institutional corruption. Note, while I find these podcasts interesting, they are not for everybody. They are mostly records of lectures given to various groups. While they are accessible, they are about serious policy matters and as many of the talks are on similar subjects, there tends to be some repetition.

Recently Lessig has been working on a new project on institutional corruption called Change Congress. The issue is that large sums of money fed through lobbyists seems to have an undue influence on the lawmakers in Congress and the Senate. The money appears to have such a large influence that lawmakers are voting against the clear wishes of their constituents.

Change Congress works to highlight these cases of apparent institutional corruption. It is fighting for citizen funded elections so that the lawmakers are not under pressure to raise the money they need to get re-elected. Thus they will be less likely to be swayed by the lobbyists. Go to the site, see what they have to say, and help them with their mission.

Saturday, November 07, 2009

Vote for Net Neutrality Now

There is a lot of talk about Net Neutrality now, and the issues are not completely clear cut as I will discuss later. However, there is also a big threat that needs to be addressed right now.

Bills are being proposed in Washington with friendly names like "The Internet Freedom Act" whose effect would be to give more control of the internet to the big ISPs and take away power from the people who are giving us innovative services like Google, Skype and Amazon. While there is also a friendly bill, and the FCC is on the side of Net Neutrality, everyone needs to act to let their congressman know whose side they are on. Visit "Save the Internet" and take action now!

Now that you have done your bit to save the internet, we can talk about the problem. When a node on the internet gets too much traffic, the traffic control algorithm will pick connections at random and kill them. While this is good for keeping the traffic flowing in the aggregate, it tends to favor one class of user over another. The disadvantage user is the one who is using a single connection to browse the web, download a song or make a voice call. The advantaged user is using Bit-torrent which opens a large number of connections to do a massive download. It does not matter if Bit-torrent loses a connection, it has many others to make up for it, but it does matter when a web browser, or Skype conversation loses a connection.

One solution is to answer greedy software with greedy software. That is every internet application would emulate Bit-torrent and greedily create hundreds of connections in case any one of them gets stomped. While this solution puts all applications on an equal footing, it may strain resources leading to a "Tragedy of the Commons", something that should not be in our bright digital future.

Another solution would be to limit the number of simultaneous sessions a user can have. I personally feel that this would be better than having Comcast or AT&T doing deep packet inspection of my packets. However a hard limit on the number of sessions may cause all sorts of problems with software that is not expecting it, leading to deadlock and other bad behavior. Does anyone have any other ideas?

Friday, October 23, 2009

Database Systems for Analytics

The question "what are the attributes of a database system for analytics?" came up during Omer Trajman's talk to the October meeting of the SDForum Business Intelligence SIG. The talk was titled "The Evolution of BI from Back Office to Business Critical Analytics". In the talk Omer gave several examples of applications that use real time analytics and explained the special attributes of each application. As he runs field engineering for Vertica, a Database Systems vendor, I am sure that these examples were based on his experience with Vertica deployments, however Omer was careful to keep his talk vendor neutral.

So what are the the attributes of a database system for analytics? Omer discussed three attributes. Firstly, an analytics database system cannot use the row level locking that is found in a traditional transaction processing database. The database system needs to provide snapshot isolation that gives a query a consistent view of the data while not preventing other operations like data loads. Having helped implement a system like this in the past, I am in total agreement with Omer.

The second attribute is the need to allow concurrency between loading and querying data. While this is related to the first attribute, it also comes with its own issues. Bulk loads are more efficient (particularly for a columnar database like Vertica), however, if you want access to the most up to minute data you need to do loads in small increments so that the data is available for query as soon as it is loaded. Managing this balance is difficult and as yet it has not been completely solved. Again, I have worked on this issue in several different systems.

The final attribute was scaleout, that is the ability to add more processing systems to handle more data and larger queries. We are building systems out of hundreds and thousands of computer systems. Scaleout is vital to effectively use these systems.

Wednesday, October 14, 2009

e-Readers for All

The e-Reader market is heating up, just in time for Christmas. Amazon is expanding features and bringing the Kindle down the price curve. Today came word of the Barnes and Noble e-Reader with two screens, an e-ink screen for reading and a small LCD touch screen for interactivity.

Also today I caught up with the "This Week in Tech" podcast from last weekend where they talked about the real killer features of the Kindle - wireless download and almost unlimited capacity. You can buy as many books as you want any time you want, which leads to buying many more books than you would otherwise buy. Imagine the scene, at dinner with your friends, you discuss books that you have recently read, and bam you buy the books they recommend there and then. In fact there was even a cry in the podcast "Friends don't let friends use a Kindle while drunk" (for fear that the judgmentally impaired friend may buy too many books).

When the original Kindle came out there was a tremendous outcry against it with people complaining of gadgets destroying their book reading experience and authors expecting to have their livelihood destroyed just as the music industry has been laid waste. Hint, musicians are doing just as well as they have always done, it is the music moguls with their "by the way, which one is Pink?" who have been laid waste. The Kindle stimulates the publishing industry and makes it much easier to buy books, leading to more sales where author gets a larger slice of the pie.

Competition is good, particularly for the consumer. The e-Reader needs another generation or so to iron out the kinks and bring the price down to the mass market levels. I am waiting for the $149 price point (iPod Nano) which should come by next Christmas if not sooner.