Friday, October 23, 2009

Database Systems for Analytics

The question "what are the attributes of a database system for analytics?" came up during Omer Trajman's talk to the October meeting of the SDForum Business Intelligence SIG. The talk was titled "The Evolution of BI from Back Office to Business Critical Analytics". In the talk Omer gave several examples of applications that use real time analytics and explained the special attributes of each application. As he runs field engineering for Vertica, a Database Systems vendor, I am sure that these examples were based on his experience with Vertica deployments, however Omer was careful to keep his talk vendor neutral.

So what are the the attributes of a database system for analytics? Omer discussed three attributes. Firstly, an analytics database system cannot use the row level locking that is found in a traditional transaction processing database. The database system needs to provide snapshot isolation that gives a query a consistent view of the data while not preventing other operations like data loads. Having helped implement a system like this in the past, I am in total agreement with Omer.

The second attribute is the need to allow concurrency between loading and querying data. While this is related to the first attribute, it also comes with its own issues. Bulk loads are more efficient (particularly for a columnar database like Vertica), however, if you want access to the most up to minute data you need to do loads in small increments so that the data is available for query as soon as it is loaded. Managing this balance is difficult and as yet it has not been completely solved. Again, I have worked on this issue in several different systems.

The final attribute was scaleout, that is the ability to add more processing systems to handle more data and larger queries. We are building systems out of hundreds and thousands of computer systems. Scaleout is vital to effectively use these systems.

Wednesday, October 14, 2009

e-Readers for All

The e-Reader market is heating up, just in time for Christmas. Amazon is expanding features and bringing the Kindle down the price curve. Today came word of the Barnes and Noble e-Reader with two screens, an e-ink screen for reading and a small LCD touch screen for interactivity.

Also today I caught up with the "This Week in Tech" podcast from last weekend where they talked about the real killer features of the Kindle - wireless download and almost unlimited capacity. You can buy as many books as you want any time you want, which leads to buying many more books than you would otherwise buy. Imagine the scene, at dinner with your friends, you discuss books that you have recently read, and bam you buy the books they recommend there and then. In fact there was even a cry in the podcast "Friends don't let friends use a Kindle while drunk" (for fear that the judgmentally impaired friend may buy too many books).

When the original Kindle came out there was a tremendous outcry against it with people complaining of gadgets destroying their book reading experience and authors expecting to have their livelihood destroyed just as the music industry has been laid waste. Hint, musicians are doing just as well as they have always done, it is the music moguls with their "by the way, which one is Pink?" who have been laid waste. The Kindle stimulates the publishing industry and makes it much easier to buy books, leading to more sales where author gets a larger slice of the pie.

Competition is good, particularly for the consumer. The e-Reader needs another generation or so to iron out the kinks and bring the price down to the mass market levels. I am waiting for the $149 price point (iPod Nano) which should come by next Christmas if not sooner.

Saturday, October 03, 2009

Search User Experience Innovations

Innovations in the Search User Experience was the topic at the September meeting of the SDForum Search SIG. The distinguished panel from Microsoft, Google and Yahoo was chaired by Safa Rashtchy, a long time analyst and commentator on the Search scene.

First, Sean Suchter General Manager of Microsoft's Search Technology Center Silicon Valley told us about the latest innovations in Bing. Sean started out with some numbers, showing that the Internet is still growing at a fast pace and that search is growing faster than the Internet in general. They measure their user's experience and see that about a quarter of searches are failures, resulting in an immediate click back. On the other hand, getting on for a half the search queries are further refined meaning that the user is engaged in a search session. Microsoft will recognize these sessions and use them to improve the user experience.

To simplify the user experience, when they are confident about what a user is searching for, Bing will show one subject on the first page with a number of related links. Sean showed us two examples. Firstly for the search term "target", where they assume the person is looking for the Target chain of stores, they show a complete set of links to Target and shopping related pages with a single link to get other search results that are not related to Target stores. The second example was "ups" where they they only show links related to United Parcel Services and sending parcels on the first page.

Next up was Johanna Wright, Director of Web Search Product Management at Google. Johanna started off by telling us that that 20% of searches have never been seen before, and that Google is dedicated to serving the long tail of web searches as well as more popular ones. To show us how far the search experience has come in the last few years, she applied the search term "how to tie a tie" to an index that they had saved from 2001, and compared it with what you get today. In 2001 you got a miscellaneous collection of links to sites like "The Indus Entrepreneur" with none about tying ties. Now you get relevant links along with image and video links, a tremendous improvement.

Johanna talked about how speed is essential to a good user experience. A couple of years ago, they added related links to popular search terms like "target" to reduce the number of steps a user needs to make to get to the page they want. Google continues to work on helping users with query formulation. She showed us the options panel that you access by clicking the "search options" link on a search results page and how it can be used to refine a search.

Finally, Dr. Larry Cornett, vice president of the Yahoo! Search Consumer Products division spoke. He started by reassuring us that Yahoo! is still in the search business and that if and when the planned combination of Yahoo! Search with Microsoft goes through, they will still provide their own front end and control their user's experience. Yahoo!'s goal has always been to personalize and structure the web. We saw the new layout for Yahoo! search results in the typical Yahoo! busy style.

After the demo's, the floor was thrown open to audience questions. Someone asked about natural language support for queries. Sean told the story as he has been in the search business for a long time. In the early days of search, natural language queries were considered important research area. Then the issue went away as providing relevant answers to queries became the dominant problem. Now that giving good answers is under control, natural language queries are making a comeback. Recently Microsoft bought Powerset to help them in this area.

There were several questions about the sizes of market segments, and growth rates, particularly in the mobile space, to which the panel would not give answers. The audience did manage to uncover the fact that while adult searches are more prevalent than mobile searches, mobile searches have been growing fast since the introduction of the iPhone and other smartphones.

Another set of questions related to real time search. All three search engines have been working on improving the speed with which they update their indexes so that they are current. There is still an open question about whether the major search engines embrace real time search or make it a separate option.