| I would highly recommend the use of the package data.table over tibble or the basic data.frame if you are doing any type of modeling in R with larger datasets. Yes R has many data structures but knowing how to use data.table will blow your mind in term of efficiency. Matt and other contributors have built something extremely fast and flexible. I get that R is not for everyone but used correctly it is a beast. Now this is anecdotal, but we have in the insurance industry what we call on level premium calculators. It is basically a program that will rerate all policies with the current set of rates. Our current R program can rate 41000 policies a second fully vectorized on a user laptop that has a an i5 from 2015. In contrast, the previous SAS program could do 231 policies a minute on xeon 64 core processor from 2017. For our workload and type of work, R has been a godsend. Bonus, we can put what our data scientist develop in R directly in production. (after peer review, testing, etc, not different than any other production code) Back when I started in 2005, we modeled in some proprietary software like Emblem, used Excel to build a first draft premium calculator, rebuilt the computation in SAS for the onlevel program and sent specs to IT to rebuilt the program again for production. All three had to produce the same results. I've tried Python, Go, Rust, Julia. I'd say Python could be a good alternative but speed of data.table, RStudio IDE and ease of package management in R makes R an obvious choice for us. I believe Julia to be the future but so far the adoption rate in house has been low. |
But... and here's the big but...I almost never actually meet anyone capable of putting all these steps together in SAS these days that actually understands the SAS computation model end to end.
And SAS's strength, a computation model not being limited by memory by default, becomes a performance weakness when everyone reads/writes every step out to disk and programs without understanding all those little intricacies. SAS hasn't helped any of this by trying to move its eco system away from "programmer" to "application users", so now "programmers" can pick up an interpreted language like R with in-memory default vectorised operations and beat SAS.
Course, I'd still recommend places move to python/R these days because of the broader ecosystems, university talent pool, and avoiding the extensive lock in of proprietary software, but I still feel I have to reflexively respond to "R faster than SAS" claims :p