Monday, January 31, 2011

Security in the Cloud

Although I am not an expert, I have been asked more than once about security in the cloud. Now I can help because last week I got an education on best security practices in the cloud at the SDForum Cloud SIG meeting. Dave Asprey VP of Cloud Security at Trend Micro gave us 16 best practices for ensuring that data is safe in a public cloud like the Amazon cloud services. I will not list all of them, but here is the gist.

Foremost is to encrypt all data. The cloud is a very dynamic place with instances being created and destroyed all over the place, and your instances or data storage may be moved about to optimize performance. When this happens, the residual copy of your data can be left behind for the next occupier of that space to see. Although this would happen by accident, you do not want to expose confidential data for other to see. The only cure is to encrypt all data so that whatever may be left behind is not recognizable. Thus you should only use encrypted file systems, encrypt data in shared memory and encrypt all data on the network.

Management of encryption keys is important. For example, you should only allow the decryption key to enter the cloud when it is needed, and make sure that it is wiped from memory after it has been used. Passwords are a thing of the past. Instead of a password, being able to provide the key to decrypt your data is sufficient to identify you to your cloud system. There should be no password based authentication and access to root privileges should not be mediated by a password, but should be enabled as needed by mechanisms like encryption keys.

Passive measures are not the end of cloud security. There are system hardening tools and security testing services. Also use an active intrusion detection system, for example OSSEC. Finally, and most importantly, the best advice is to "Write Better Applications!"

Monday, January 17, 2011

The Steve Jobs Media Playbook

Information wants to be free. Steve Jobs is not usually associated with setting information free, however he set music free and may well be on the way to set more media free. Here is the playbook that he used to set music free, and an examination of whether he can set other media free.

Back at the turn of the millennium digital music was starting to make waves and Apple introduced their first iPod in 2001. At the beginning, it was not a great seller. Next year the second generation iPod that worked with Microsoft Windows came out and sales started to take off. The next problem with promoting sales of the iPod was to let people buy music directly. In those days, to buy music you had to buy a CD, rip it onto a computer and then sync the music onto the iPod.

The record companies did not like digital music. It was in the process of destroying their business model of selling physical goods, that is CDs, which had been plenty profitable until the internet and file sharing had come along. Thus the record companies knew that if they were going to allow anyone to sell digital music, the music content had to be protected by a strong Digital Rights Management (DRM) system. Basically DRM encrypts digital content so that it can only be accessed by a legitimate user on a accredited device.

Now there is one important thing about any encryption, it depends upon a secret key to unlock the content. If too many people know a secret, it is no longer a secret. So it made perfect sense for Apple to have their own DRM system and be responsible for keeping their secret safe. The only problem was that Apple effectively controlled the music distribution channel because of the DRM system and its secret. By providing exactly what the music business had asked for, Apple managed to wrest control of the distribution channel from them.

In the past I have joked about the music business controlling the industry by controlling the means of production. In fact they controlled the business by controlling the distribution channel between the artists and the record stores who sold the music. When the iTunes store became the prime music distribution channel it was game over for the recording industry. They had to climb down and offer their music without DRM to escape from its deadly embrace. DRM free music has not stopped iTunes but it does open up other sales channels.

The remaining question is what will happen with other media? Apple will not dominate the tablet market as it has the music player market so it will not be able to exert the same influence. On the other hand, other media is not a collectible as music. We collect music because we want to listen to it over and over again. With most other media, we are happy to consume it once and then move on. Thus we do not feel the need to own the media in the same way. I have some more thoughts that will have to wait for another time.

Friday, December 31, 2010

The Year in Posts

Looking back at posts in this blog over the last year I see a couple of themes emerge. Firstly there were many posts on technology and media, in particular several on the iPad which has had an extraordinary effect as the first device specifically designed for consuming media. Other issues of concern included television, 3D, aspect ratios and the problem of registration at web sites. We are going through huge changes in the media world as digitialization and the internet delivery system changes everything. I have written many posts on this in the past and I will continue to do so.

The SDForum Business Intelligence SIG that I chair had a banner year with so many memorable meetings, it is difficult to pick out the best one. A fantastic talk from Google Analytics Evangelist Avinash Kaushik on "Web Analytics 2.0" drew by far the biggest crowd. We had two great big data talks: "Winning with Big Data" from Michael Driscoll of Dataspora and "Mad Skills for Big Data" from Brian Dolan, both very impressive. Donovan Schneider from SalesForce.com spoke on "Real Time Analytics" and Dan Graham from Teradata spoke on "Data Management in the Cloud". Finally Peter Farago and Sean Byrnes of Flurry talked about the extraordinary information they collect about smartphone usage that they collect from their Mobile App analytics platform. Co-chair Paul O'Rorke who organized several of these meetings has stepped down and we will miss him greatly.

Finally, Blogger started collecting statistics in May of this year. Looking at the page views on this blog, my last post on "Windows File Type Fail" has generated a lot of interest in the few days since it was posted. The most viewed post is a 2009 post on "Ruby versus Scala" followed closely by the Windows post. In my view, the post last year about the Windows Autorun feature is a better rant than the current one. You can feel the veins bulging in that rant whereas this years rant is very laid back in comparison. Do not worry, there are many more misfeatures of Microsoft Windows to rant about so I am not going to run out of material for a long time.

Tuesday, December 28, 2010

Windows File Type Fail

It is that time of year when I rant about an awful, awful, awful feature of the Microsoft Windows operating system. This year the subject of my diatribe is file types. You see, Windows thinks that every file has a type and the type connects the file to a program that can handle it. Like many "features" in Windows, file types are intended to make your life easier while in practice doing the opposite. Note that some time ago, I wrote about file systems and Content Management as opposed to a file type manager. I still think there are some good ideas in there that need to be explored.

If you do not know what a file type is, here is a primer. Every file has a name. The file type is a usually 3 letter extension to the name. So for example, the program for Windows Explorer, is called "explorer.exe", the dot is a separator and exe is the file type. The type exe means a program that Windows can run. To look at all the file types on an XP system, bring up the control panel, select Folder Options and then click the File Types tab. On Vista and 7, the path through the control panel is slightly different. The dialog shows a huge list of registered file types and the programs that will handle them. Note that the first few entries in the list are not representative, go down to the middle or bottom of the list so see what it is really all about.

Windows goes to great length to hide file types from you. By default they are not shown anywhere and you can go for a long time without even knowing that files have types. One way to run into file types is to double click on a file with a type that Windows does not know about. Windows shows a dialog asking you what program you want to use with it. You can either look up the file type on the web or select a program from a list. The most annoying aspect is that when you select a program from a list, there is a little check box that says "Always use the selected program to open this type of file." If you test a program that does not work without unchecking the box the mistake is remembered and thereafter every time you open a file of that type, the wrong program is chosen. If you uncheck the box, a mistake is not remembered, however neither is a success. Either way, you can lose. Moreover, to recover from a mistake, you have to find the entry for the file extension in the File Types window discussed above and delete it, which is not a trivial task, given the number of file types.

Another little problem with file types is that they can be wrong, confused or direct Windows to do the wrong thing. I wrote about a problem with .avi files from a Canon camera breaking Windows Explorer. There are security issues where Windows is penetrated because it trusts the file type information and then does the wrong thing with a broken file.

However, the real problem with file types appears when you install a new program. Programs are greedy. They want to control as much of your experience as possible so they will try to register as many different file types as they can. If you have one program that deals with a type of file and you install another program that deals with similar files, the new program should pop up a dialog asking you which types of files it should handle. Then you have to make all sorts of complicated decisions about which file types the new program should handle.

Programs for handling media are the worst in this respect because there are lots of different media types and it is common to have several media players installed to handle different special cases. For example, on my home computer I have Windows Media Player and a DVD player because they came with the system. Then there is iTunes for my iPod, the QuickTime video player that comes with iTunes, a RealPlayer for the BBC iPlayer and finally a program for ripping and burning CDs and DVDs. There may well be other media players amongst the shovelware preinstalled on the box. There are also programs for editing specific media types like at least two picture editors and a video editor or several.

A typical scenario is that you are installing a new media player program because you want to use it to view a particular type of media. Unfortunately, the program installer knows about all the media types that it can handle and asks you to chose what media types types it should handle. Thus you have to disengage your thoughts from the one media type that is the object of your attention and instead start to think about all those other media types that you are not interested in. Unfortunately, there is the worry that if you give in to the new media player and let it handle certain types of media, other things will stop working. Maybe you will not be able to watch videos, or maybe videos will stop syncing with your portable media player because you changed the program associations with a particular file type. Given the complexity of these systems, who knows what may go wrong.

I said that the media player installer should ask you which file types you want associated with the program. A few years ago, Real managed to destroy much of their franchise by not playing nice and fair with file types. The RealPlayer installer switched all file types that it could handle to use the RealPlayer without bothering to ask or notify. Worse, if you went in and installed another program that changed the file type associations or even tried used the File Types dialog screen to change file type associations, it would just change them back to the RealPlayer, again without a notification. When this came to light, many people, myself included, uninstalled RealPlayer and swore never to install any software from Real again. Recently I caved on this resolution so that I could listen to old BBC radio shows like "The Goon Show" with the BBC iPlayer which it turns out to be just a rebadged Real player.

Since the RealPlayer imbroglio, installer programs have been a lot more careful about asking users about file types, but that just throws the problem back to the user. As the whole point of file types is to hide system complexity from the user, this it is no solution at all. A better path is to do without file types. Why are they necessary? Do they really serve a purpose? Other operating systems get along fine without file types, so why does Windows need them. Lets just throw them out and make life easier.

Monday, December 20, 2010

Is That Annoying Modal Caps Lock Key Going Away?

So Google came out with their new Chrome Operating System, loaded it onto a laptop and gave the whole caboodle of people to play with and comment on. While Chrome OS has generated a lot of comments, the largest and most active discussion has been about the Caps Lock key. You see, Google has changed the behavior of the key that used to be Caps Lock to instead call up a search page. I am sure this change was made to pander to keyboard weenies who want to Google without having to lift their hands from the keyboard. Anyway, the change has backfired. Instead of talking about Chrome OS, everyone is engaged in a furious discussion of why the Caps Lock is either essential or should have been disposed of a long time ago.

I have two problems with the Caps Lock, no make that three. The first problem is that it sits right between two important keys. Below is the Shift key whose importance needs no explanation. Above it is the Tab key, used for next field, command completion, automatic indent and plenty of other useful purposes. In the middle sits Caps Lock just waiting to be hit by accident. This brings to the next problem, Caps Lock is modal. Hit the Caps Lock key by accident and you do not make just one typing mistake, rather the whole keyboard is shifted into a new mode and the error compounds. By the time I look at the screen, I have typed half a sentence in the wrong case.

I am a member of the tribe that hates modal user interfaces with a passion. Some of my compatriots physically remove the Caps Lock key or reprogram their keyboard to reduce typing errors. I have only gone as far as to disable that other annoying modal key. The Insert key is used by many editors to switch between insert mode and overtype mode. If you hit Caps Lock by accident, the result is obvious, if you hit Insert by accident you can go on for some time before you realize that you are seriously damaging the document that you are trying to fix up. Of course, the Insert key is slightly off the main keyboard, right above the really useful Delete key and just waiting to be hit by accident.

My final problem with the Caps Lock key is that if you are in Caps Lock mode and you press shift, it reverts back to entering lower case. This means that when I hit cAPS lOCK by accident every key I type is in the wrong case, not just some of them. I happen to have an old typewriter from the 1930's so I know what shift really means. The Shift key causes the whole paper carriage and platen to move so that when the typebar comes down a different type piece strikes the ink ribbon and paper. Shifting the platen is why it is called the Shift key and it is a heavy key to hold, so there is a Shift Lock key that is a mechanical lock to hold the platen in the shifted position. With the platen locked in the shift position, hitting the shift key does nothing, so why has someone gone to the trouble of programming bogus behavior in out modern and supposedly more convenient keyboards?

Now, I know that there are people who love the Caps Lock key and who use it all the time. For my part, given the choice between a key that causes a small typing mistake every time I hit it by accident and a key that brings up a new web page by accident, I will choose the Caps Lock function every time. Caps Lock is annoying but I have lived with it for a long time and it is much smaller surprise than a new page that I do not want.

Saturday, December 18, 2010

The Gawker Password Fiasco

Last month I wrote about password security, just a little too soon. This month the popular blog site owner Gawker admitted to a huge security breach where hackers had broken into their web servers and stolen their entire database of user account names with email addresses and passwords. The attack has brought password security to every ones attention, with people reporting that their email and other accounts have been compromised. There are a lot of discussions of protocols for password security with good information, and unfortunately there is also a lot of misinformation. Here is my take.

The Forbes magazine web-site has a clear description of the attack on Gawker, (although their discussion of the password encryption is not correct). The short story is that the break-in was done by a hacker group called Gnosis who were annoyed by Gawker. Frankly, given Gawker's arrogant style, who has not been annoyed by them at some time? Gnosis first broke in to Gawker in July and got the passwords to accounts for Nick Denton and 16 other staffers there. In November, Denton noticed some possible tampering in a web account, and finally in December Gnosis announced their break in and released data they had gathered.

Although, Gawker had used encryption to hide the users passwords, they are susceptible to a brute force attack and many passwords have been broken. Gawker lost over 1 million accounts and more than 100,000 passwords have been cracked and published so far. The Wall Street Journal has a nice analysis of the most popular passwords including a frequency graph.

There is a lot of misunderstanding about how passwords are stored on a web site and how a brute force attack takes place. For example, the Forbes article I mentioned earlier obviously does not have a clue. I do not know for certain how Gawker protects their passwords, however the best practice is to use a salted hash. With this technique, the web-site chooses a salt, which is just a random string of characters. When a user sets a password, the salt is appended to the password and the whole string is hashed with a cryptographic hash function like SHA-1. The resulting hash value is a seemingly random string of bits, and this is stored as the encrypted users password. When the user wants to log in, the salt is added to the supplied password, the resulting string hashed, and the hash value compared to the saved hash. If they are the same, the user must have provided the correct password and is allowed to log in. By using a salted hash, the web-site does not save the users password, they just save a cryptographic hash that is used to confirm that the user knows their password. To make things more secure, the web-site can save a different salt for each user or just add the user name to a common salt so that even if two users have the same password, the salted hash of their passwords are not the same.

In a brute force attack the attacker knows the algorithm used to generate the salted hash and has the salted hash of the password. The attacker generates a list of potential passwords, applies the password checking algorithm to each password and if the results are the same, they have guessed the users password. If the attacker can try 20 passwords a second, they can test well over a million passwords a day on a single computer.

It is very easy to generate a list of potential passwords. One good starting point is a list of broken passwords, such as published by Gnosis from the attack on Gawker. The next step is a dictionary of common words and proper names. Many applications have a spelling dictionary that can be used as a starting point. Then try some simple variations like adding a number to the beginning or of words, capitalizing letters in the word and make common substitutions for letters such as 1 for the letter 'i' and 5 or $ for 's'.

So now that you now how it is done, think about your passwords and how easy they can be attacked by brute force, and excuse me while I go and change some of mine.

Saturday, December 11, 2010

Now You See It: the Book

If you are of a data analytics bent or know someone who is and are looking for a book to put on the Christmas list, consider Now You See It: Simple Visualization Techniques for Quantitative Analysis by Stephen Few. This is a beautiful book that would not look out of place on a coffee table, yet at the same time, is full of practical information about how to do analytics with charts, graphs and other visual tools.

The book is divided into three sections. The first section covers visual perception and general visualization techniques for looking at data. Then the second section goes into more detail with chapters on specific techniques for different types of analysis including time-series analysis, ranking analysis, deviation analysis and multivariate analysis amongst others. Each chapter in this section ends with a summary of the techniques and best practices for that type of analysis. Finally the book ends with a shorter section that looks at promising new trends in visualization.

There are copious examples of graphs and charts drawn by different software tools. While some of these graphs come from high end tools like Tableau and Spotfire, others are drawn by Microsoft Excel. In fact there are several specific procedures for using features of Excel to do sophisticated analytics. That is not to say that the book suggests that you can do everything with a spreadsheet. The first part shows you what to look for in visual analytics software and it essential reading before going out and choosing which tool to use.

So, if you are looking for a quality and practical gift for an analytician, choose "Now You See It".

Tuesday, November 30, 2010

The Registration Dilemma

To register or not to register, that is the question:
Whether 'tis better to create a new online account,
or just make do with with the existing ones,
and so lead a slightly less ennobled life.

Online account registration is a barrier, something that we are all thinking about as this is the season for buying stuff. As I said previously, I have about 70 online accounts where I actively maintain a user identity, and I have created many many more. Thus every time I am presented with the choice of registering for a new site, I stop and think, do I really want to create another account? In the past couple of weeks I have decided to forgo on creating 3 new online accounts and just stick to my well traveled paths.

Registration is not always thought of as a bad thing. For example, Dave McLure, Master of 500 Hats, micro Venture Capitalists and relentless promoter of analytics to improve web based businesses, has Activation as the second of his 5 step program to web enterprise success. Now Activation does not necessarily imply Registration, however Registration is the most common and strongest form of Activation. Dave's perspective is that to succeed on the net, your product needs to be strong enough to overcome any barriers to Activation.

There have been many initiatives to vault over the registration hurdle. The most promising one is OpenId, an open system that allows you to use your account at one web site to log onto other web sites. A couple of years ago I thought that this was a good solution to the Single Sign-on problem and worth promoting. Now OpenId seems to be moribund and it is not widely used. I am not sure what happened, but I did hear rumors of a argument and a split which diminished the organization.

One of the problems with OpenId and any other system is that it tends to favor and strengthen the big players like Yahoo and Google. Another idea the people often trot out is some form of micro-payments system that would obviate the need for registration at many sites. There are a couple of problems. Firstly, any payment is its own barrier, and creating many little barriers instead of one is not a path that is likely to lead to success. For a broader discussion of this issue I recommend the book Free by Chris Anderson.

The second problem is that a successful micro-payment system will favors and strengthen the big players that operate it. It has to be a big player as no one is going to trust their payments to some small and unknown start-up. In practice, the only really successful micro-payment site is iTunes, and it shows up all these problems. In the beginning we all cheered as Steve Jobs took on the record companies. Now that iTunes is the leading purveyor of music, many people have taken to railing against the power of Apple.

The Registration Dilemma is this. We can either continue with the current system that has a chaos of millions of sites, each with their own registration that we need to manage, or we can give in to consolidation and just deal with a few giants. Every time I think about it, I end up siding with chaos.

Tuesday, November 16, 2010

Yeah, Yeah, Yeah

This morning I woke up to the local newspaper headline "Do you want to know a Secret?", and knew that something was going on. Later they changed their tune to something more like the The Wall Street Journal which starts their piece "Steve Jobs is nearing the end of his long and winding pursuit of the Beatles catalog." Other newspapers had headlines like "All you need is iTunes", "Let it be Available" and "Apple and The Beatles finally come together on iTunes". All in all, it seems like bunch stupid headline tricks from the old media, a sure sign that they are getting past it.

Meanwhile the new media is a lot more standoffish. Wired News is like "Yawn". TechCrunch is all business with "All 17 Beatles Albums Are In The Top 100 On iTunes". Of course Fake Steve Jobs had a field day, providing by far the best commentary on the whole event.

Monday, November 15, 2010

Open Source Coopetition

Coopetition is the driving force behind many of the best Open Source projects. In the past, I have written about several different reasons that Open Source projects exist. There are business models like the low cost sales channel. Open Source can act as a home for old software that is still useful, but not commercially essential to a business. There have been attempts to use Open Source as a weapon, to suck the air out of a competitors lungs, by devaluing the intellectual property of the competitor, although many of these attempts have been less successful than their originator hoped.

A presentation on Hadoop got me thinking about Coopetition and Open Source. Hadoop is a big Open Source project to implement all the components of what I have called the Google Database System and a lot more. The major contributors to Hadoop are Yahoo!, Facebook and Powerset - now a part of Microsoft. While these companies are related in that Microsoft owns a stake in Facebook, has tried to buy Yahoo! and now Yahoo! uses Microsoft's Bing search engine, they are also competitors, fighting each other for the attention of web users.

So is it strange that these three companies should cooperate to build Hadoop, an incredibly useful and widely used Open source project? Firstly, the genius of Open Source is that they are not cooperating directly with each other they are all contributing code to a third party, the non-profit Apache foundation that oversees the Hadoop project. Secondly, by spreading the cost of the software over many contributors, they all gain much more than they contribute. Finally, many eyes and the public nature of the code tends to make it better than code that is bottled up in secret and protected from prying eyes. Because the Open Source model allows for the kind of coopetition that brings us software like Hadoop, we all benefit.