Hacker News new | ask | show | jobs
by IndianAstronaut 3655 days ago
This is functionally similar to dplyr in R. Although the more SQL like syntax of dplyr is much more handy.
1 comments

And when the data gets bigger, there's data.table[1], which performs amazingly well at certain tasks (vectorized ops ftw!), though the syntax can get a little clunky (if you squint at it hard, it's SQL-ish). On my 2012 macbook pro, I'm able to do (some) transformations of tables containing 10s of millions of rows in only a few seconds (and sometimes faster).

It's possible to use dplyr and data.table together, as well, to good effect[2].

[1] https://github.com/Rdatatable/data.table/wiki

[&] https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A...

[2] http://stackoverflow.com/questions/21435339/data-table-vs-dp...

[&] https://twitter.com/hadleywickham/status/553169339751215104

Are you able to load data sets into data.table which are larger than memory?
AFAIK, with data.table it's all in-memory; whereas dplyr has the option of working with a database backend.

On the other hand, data.table has robust support for modify and update ops by reference, which can be a big performance saver.