Thursday, July 17, 2008

A Gentle Introduction to R

We were given a gentle introduction to the R statistical programming language and its application in Business Intelligence at the July meeting of the SDForum Business Intelligence SIG. The speakers were Jim Porzac ( Senior Director of Analytics at Responsys) and Michael Driscoll (Principal at Dataspora). Jim has posted the presentation here.

R is an Open Source project that uses the GNU license. It has a growing user base with a strong support community and a user group (called UseR Group - try Googling that). There are now almost 1500 packages for the languages that supports various statistical techniques and specialized application areas. Packages include: Bayesian, Econometrics, Genetics, Machine Learning, Natural Language Processing, Pharmacokinetics, Psycometrics, which gives some idea of the range of subjects and techniques that R covers.

Jim did most of the talking, introducing the language and showing us some examples of its use. One example is his data quality package that he uses on each new dataset that he receives for analysis at Responsys. Another example showed how reporting capabilities while a third showed sophisticated graphs and plots used for customer segmentation analysis. Michael showed us how he used R to do some interesting and very practical analyzes of Baseball statistics.

The audience probed R's strength and weakness. R has the connectivity to get data for analysis from databases and other sources. R also has excellent graphing and reporting capabilities. Currently R works by reading data into memory where it is manipulated, which limits the maximum size of data set that can be analyzed to the many Gigabyte range.

One person asked for a comparison with SAS. R has the advantages of being free with an enthusiastic user base to keeps it on the cutting edge. Also R is a more coherent language than SAS, which is a collection of libraries, each of which may be very good but they do not necessarily make a whole.

Jim and Michael are starting a Bay Area chapter of the UseR Group. If you are interested, contact Jim Porzac at Responsys.

Wednesday, July 09, 2008

Social Search

The SDForum Search SIG pulled together an A-List panel for their July meeting on Social Search. Moderator Safa Rashtchy hosted Bret Taylor of FriendFeed, Ari Steinberg of FaceBook, Jason Calacanis of Mahalo and Jeremie Miller of Wikia Search. Of the panelists, Jason Calacanis had the most to say, was arguably the most interesting and definitely the most opinionated. He also recorded the event with the camera in his MacBook Air. Vallywag has a better and more concise video excerpt of Jason in action shaded by their desire to capture controversy.

Facebook and FriendFeed are working on automated search within their social networks, while Mahalo and Wikia Search are working on improving general search by using people to curate the results. Mahalo is paying people, while Wikia Search is trying to use the Wikipedia model of free community involvement.

Most of the audience questions to the panel were about their business models and monetization. I tried to tried to get into technicalities by asking a question about Search Quality, there was a question on privacy, and one audience member argued that none of the panelists companies were doing social search as he defined it.