|
|
|
|
|
by pickdenis
2360 days ago
|
|
I know this is a dead horse, but I think R seriously shot itself in the foot with its data structures[1]. I don't really see a solution for this, as fixing it would never be backward compatible. I'll always pick Python over R because the data structures actually make sense to me as a programmer (objects that look like lists, dicts, matrices, etc. or any combination of the above, and they all behave in very predictable ways). I think this puts off a lot of other people like me. [1]: https://jamesmccaffrey.wordpress.com/2016/05/02/r-language-v... |
|
But behind the scenes, R is just a lisp with some data structures that are adapted to statistics and data science.
All base data structures are by default immutable. And e.g. the vector type is extremely performant as it's just a thinly wrapped C Array. In Python you need to reach for Numpy for anything similar, and you do feel some pain when converting between native python types and Numpy types for various functions which support one or the other.
The data frame is immensely powerful. And has excellent performance characteristics as it's built upon vectors. A list of objects, like you'd make in python is just a lot slower and more unwieldy to deal with. And much harder to make generalizable functions upon.
Hadley Wickham's Tidyverse[2] is exactly an attempt to hide away the arcane details and create a modern, coherent and consistent language on top of R, keeping the power of all the great statistics R libraries. The fact that R behind the scenes is a Lisp, with support for macros, makes this possible. For doing data-transformations and statistics, I can't think of anything currently as powerful as CRAN + Tidyverse.
[1] https://en.wikipedia.org/wiki/S_(programming_language)
[2] https://www.tidyverse.org/