| > small dataset (100GB) What counts as large or small definitely varies a lot depending on the context of the conversation/analysis. MotherDuck's "Big Data is Dead" post [0] sticks in mind: > The general feedback we got talking to folks in the industry was that 100 GB was the right order of magnitude for a data warehouse. This is where we focused a lot of our efforts in benchmarking. Another point of reference is [1] > [...] Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB > TPC-H Small Dataset = 866k tuples, sf 0.1 [0] https://motherduck.com/blog/big-data-is-dead/ [1] https://db.in.tum.de/~kersten/Tidy%20Tuples%20and%20Flying%2... |
and then user discovers that DuckDB is plagued with OOMs and dramatic performance degradations when his data is slightly larger than memory.