We were given a gentle introduction to the R statistical programming language and its application in Business Intelligence at the July meeting of the SDForum Business Intelligence SIG. The speakers were Jim Porzac ( Senior Director of Analytics at Responsys) and Michael Driscoll (Principal at Dataspora). Jim has posted the presentation here.
R is an Open Source project that uses the GNU license. It has a growing user base with a strong support community and a user group (called UseR Group - try Googling that). There are now almost 1500 packages for the languages that supports various statistical techniques and specialized application areas. Packages include: Bayesian, Econometrics, Genetics, Machine Learning, Natural Language Processing, Pharmacokinetics, Psycometrics, which gives some idea of the range of subjects and techniques that R covers.
Jim did most of the talking, introducing the language and showing us some examples of its use. One example is his data quality package that he uses on each new dataset that he receives for analysis at Responsys. Another example showed how reporting capabilities while a third showed sophisticated graphs and plots used for customer segmentation analysis. Michael showed us how he used R to do some interesting and very practical analyzes of Baseball statistics.
The audience probed R's strength and weakness. R has the connectivity to get data for analysis from databases and other sources. R also has excellent graphing and reporting capabilities. Currently R works by reading data into memory where it is manipulated, which limits the maximum size of data set that can be analyzed to the many Gigabyte range.
One person asked for a comparison with SAS. R has the advantages of being free with an enthusiastic user base to keeps it on the cutting edge. Also R is a more coherent language than SAS, which is a collection of libraries, each of which may be very good but they do not necessarily make a whole.
Jim and Michael are starting a Bay Area chapter of the UseR Group. If you are interested, contact Jim Porzac at Responsys.