| I've been using R because I have an amateur's interest in statistics. As a programmer, it seems like an awkward language, although no more awkward than SAS, SPSS, etc. And as I do more analysis the language makes more sense. It's a special-purpose language made to do a specific task. The general workflow for doing data analysis is 1) import the data 2) clean it and format it properly as input for a pre-built package that does the actual analysis 3) feed it to the package and 4) interpret the results. To that end, typically R programs are short and pretty declarative. R packages contain C or FORTRAN extensions that do all the heavy lifting. Substantial amount of imperative R code is going to be slow. For instance, looping over a vector is always worse than applying a vector transformation, and R provides a rich set of transformations for all its data types. R has gotten popular because the proprietary guys dropped the ball at the universities. I recall reading a posting by one researcher who said he switched to R because his students could only use SAS at the school's stats lab, whereas they could run R on their computers at home. Once researchers switched to R, they started publishing their work with code meant to be run with R. The cutting edge is important in stats, so people want a short lead time between when a new test or model is published and when it's available. SAS's "cathedral" can't really keep up. Combine that with SAS's licensing costs (both arms and a kidney too) as well as its overall "mainframey" feel, and you can see why R is winning. EDIT: Another big win for R that I forgot to mention is its support for visualizations. A step that should perhaps come after importing the data above is investigating it with various diagnostic charts (scatter charts, box charts, etc) these are all just function calls in R. In addition, R has a powerful graphics engine and there are a huge number of packages available to create more sophisticated visualizations: http://addictedtor.free.fr/graphiques/ |
John Chambers created "S" at Bell Labs. S was a programming language designed for interactive statistical analysis. Much like gcc and icc are implementations of C compilers, R and S-PLUS are implementations of S. S-PLUS was/is the primary proprietary implementation of the S language, whereas R is the primary free one (also, sometimes called GNU S). (SAS and SPSS are completely different languages/systems as far as I know.) I think that statisticians at some point made a conscious effort to publish their work in R, rather than S-PLUS (or any other statistical system like SAS) because it was more widely available. That in turn led R to be a viable competitor to S-PLUS (and other systems) because it had vast amounts of recent statistical libraries, often implemented by the people who developed the techniques. That said, SAS and SPSS seem to pretty much still have social science students locked up --- the market for R is probably statisticians who are also excellent functional programmers.
This history is in really marked contrast to MATLAB and its corresponding free version Octave, where computer scientists pretty much refuse to use Octave, despite MATLAB's massive price tag to pretty much everyone involved (even with 90% discounts).
(That said, if anyone lived through the change over from S-PLUS to R, I'd love to hear if this history is wrong!)