Hacker News new | ask | show | jobs
by zosima 2360 days ago
True, the default semantics of R's data structures are somewhat arcane (of course as they're based on S [1] from the 70's). And the current support for e.g. 64bit integers leaves something to be desired.

But behind the scenes, R is just a lisp with some data structures that are adapted to statistics and data science.

All base data structures are by default immutable. And e.g. the vector type is extremely performant as it's just a thinly wrapped C Array. In Python you need to reach for Numpy for anything similar, and you do feel some pain when converting between native python types and Numpy types for various functions which support one or the other.

The data frame is immensely powerful. And has excellent performance characteristics as it's built upon vectors. A list of objects, like you'd make in python is just a lot slower and more unwieldy to deal with. And much harder to make generalizable functions upon.

Hadley Wickham's Tidyverse[2] is exactly an attempt to hide away the arcane details and create a modern, coherent and consistent language on top of R, keeping the power of all the great statistics R libraries. The fact that R behind the scenes is a Lisp, with support for macros, makes this possible. For doing data-transformations and statistics, I can't think of anything currently as powerful as CRAN + Tidyverse.

[1] https://en.wikipedia.org/wiki/S_(programming_language)

[2] https://www.tidyverse.org/

3 comments

In typical Lisps a vector would be a one-dimensional array, which by default is not specialized to a particular data-type. So the most general data type would be the n-dimensional array and a vector would be a one-dimensional array. A matrix would be a two dimensional array. In Common Lisp one can also ask Lisp to generate a type-specific array (like a string, a bitvector, an array of single-floats, ...).

In R it's slightly different. The vector (being generally without dimensions) is the base data type and n-dimensional arrays are made of a vector and dimensions. A matrix is then a 2d array. Also vectors/arrays are by default type-specific.

> support for macros

From what I've seen, R does not support macros, but functions which can retrieve/generate code at runtime. That's an early mechanism which got replaced by macros in Lisp. Macros in Lisp are source code transformers and can be compiled - thus they are not a runtime mechanism like in R or earlier Lisps with so-called FEXPRs.

This 5 minute video by Wickham was eye opening for me regarding the lispiness or R.

https://youtu.be/nERXS3ssntw

modern Lisps don't use unquote/quote like that.

This looks more like 'FEXPRS' from decades ago.

1962 the ideas of macros were introduced and macros are source code transformers, which take source code and generate new source code. This can also used in a compiled implementation, where macros translate the code before compiling.

FEXPRs are then functions which get arguments unevaluated and can decide at runtime which to evaluate and how.

> 64bit integers leaves something to be desired

This is something I wish there was more progress on. A serious limitation in some contexts.