Hacker News new | ask | show | jobs
by tveita 1261 days ago
Clickhouse or DuckDB are databases I would look at that support this use case pretty much "out of the box"

E.g. https://benchmark.clickhouse.com has some query times for a 100 million row dataset.

2 comments

DuckDB is so simple to work with. It's only worth to look elsewhere with real big data, or where you really need a client-server setup.

I hope it receives more love.

Duckdb is outrageously useful. Great on its own, but slots in perfectly reading and providing back arrow data frames, meaning you can seamlessly swap between tools when SQL for some parts and other tools better for others. Also very fast. I was able to throw away designs for multi machine setups as duckdb on its own was fast enough to not worry about anything else.
Having used all three I'd go with Clickhouse/DuckDB over Arrow every time.
Oh interesting - why?
They're easier to use and faster is the tl;dr.
100% agree.
Probably for SQL (top n, ...), but not for wrangling & analytics & ML & ai & viz