|
|
|
|
|
by marcle
1691 days ago
|
|
TLDR: Arrow and DuckDB provide fast database aggregates compared with R's RDS format and, to an extent, SQLite. It is unclear how much functionality is available for Arrow under R: any comments? It would also be interesting to see a similar benchmark for Python, which could include the embedded version of MonetDB -- an R package for MonetDB/e is not yet available. Edit: amended the TLDR to reflect jhoechtl's and
wodenokoto's comments. SQLite provided reasonably memory efficient aggregates. |
|
RDS is slow to load, because it has to unzip and read everything into memory. All others are fast to load because they have somesort of index into data on disk (at the cost of being much larger at rest)
Everything else is fast to load compared to RDS, with arrow being the fastest because its index happened to be optimized for the test query.