Saturday, July 24, 2010

Data Management in the Cloud

Over the last couple of years, I have seen several presentations on the computing Cloud and how it is the next big thing. I realized that I still have a lot to learn from Daniel Graham's presentation "Data Management in the Cloud" at the July meeting of the Business Intelligence SIG. Dan leads Active Data Warehouse marketing programs for Teradata. If you have been living under a rock and do not know what cloud computing is, Wikipedia has a reasonable explanation. Dan distinguished between the public cloud as a rentable computing resource like Amazon's Elastic Computing Service and a private cloud which is your businesses computing resources in a datacenter behind the company firewall which uses virtualization software like VMWare to allow many applications to share hardware.

The big picture that Dan painted is that cloud computing is coming and that you need to get ready for it. By 2015, 20% of computing resources worldwide will be in the cloud. Start now by getting experience with the cloud to find out what works, what needs to be changed to make it work and what does not work. Teradata has been experimenting with cloud computing and is working with hardware and software vendors like VMWare and Amazon to ensure that Teradata database systems work well in the cloud. Informatica is another example of a software vendor that is working to ensure that their data integration software works well in the cloud and between clouds. NetFlix is an example of a company that has adopted cloud computing and recently announced that they were moving all their movie hosting into the Amazon computing cloud. The US Government is the leading user of cloud services having moved much of their computing needs into the cloud.

Cloud computing uses commodity hardware, which combined with the overhead of virtual machine software will not give you the best performance, however it is "good enough" for most applications. Dan took the well known quote from the movie Forrest Gump and bent it to his needs. “Clouds are like a box of chocolates. You never know what you're gonna get.” There is some high end software that is not suitable for cloud computing, the main problem coming from high IO requirements. The size and capabilities of a cloud computing host is often optimized to run a single instance Oracle database doing OLTP. In practice most applications are less demanding than this.

There were many other interesting tidbits in the presentation. Here are some examples. It is more expensive to get data out of a cloud than to bring it in. Why is unknown, but something to take into consideration when using a cloud. An interesting application for cloud computing is what Dan called "Workload Isolation". The idea is that when you have partners or consultants who need access to your data it is often preferable to put the data they need in the cloud rather than let them inside your firewall. In all the examples that Dan showed of Business Intelligence applications in the cloud, he talked about a Data Mart with the implication that a full Enterprise Data Warehouse was too large and demanding an application for the cloud for now.

The slides from the presentation are available at the SDForum Business Intelligence SIG web site.

