Hacker News new | ask | show | jobs
by g9yuayon 960 days ago
> the 1 billion row benchmarks are run on a single, uncompressed 50 GB CSV file. 50 GB should be stored in multiple files.

For a generic OLAP db, maybe. In this case, though, a single file fits one of DuckDB's use cases: analytics on the data consumed by or produced by a data scientist. In such scenario, it's not uncommon for a multi-GB input or for dumping GBs of a dataframe into a single CSV file.