Hacker News new | ask | show | jobs
by int_19h 3650 days ago
Could you expand on some of your use cases? An R data frame, by definition, has to fit into memory, so it would seem that any sort of map/filter/group/fold operation would be fastest if performed in-memory, as well. And I assumed that e.g. joining data frames (where you would run out of memory really quickly if your datasets are large to begin with) would be uncommon... am I wrong?
1 comments

Depends on what you mean by uncommon.

I for example often need to score/model data which doesn't fit in RAM (on my PC) so I use libraries like bigGLM which can use out-of-memory data to build the models. One of the options is SQLite, but you can use an ODBC connection.

Additionally, I can explore slices of the data, which resides only on disk. I don't even need to import it. I can use dplyr (very famous package for aggregations and slicing) which will map the R syntax to SQL which is executed by SQLite.