|
|
|
|
|
by int_19h
3650 days ago
|
|
Could you expand on some of your use cases? An R data frame, by definition, has to fit into memory, so it would seem that any sort of map/filter/group/fold operation would be fastest if performed in-memory, as well. And I assumed that e.g. joining data frames (where you would run out of memory really quickly if your datasets are large to begin with) would be uncommon... am I wrong? |
|
I for example often need to score/model data which doesn't fit in RAM (on my PC) so I use libraries like bigGLM which can use out-of-memory data to build the models. One of the options is SQLite, but you can use an ODBC connection.
Additionally, I can explore slices of the data, which resides only on disk. I don't even need to import it. I can use dplyr (very famous package for aggregations and slicing) which will map the R syntax to SQL which is executed by SQLite.