|
|
|
|
|
by wenc
960 days ago
|
|
Fantastic job by the DuckDB team. I’ve been using it for the past year to query 100s of GBs of Parquet files with complex analytic queries involving multiple levels of aggregations, joins and window functions and it all works and works fast. And I do all this from Jupyter Notebook. It’s actually faster than AWS Athena for me. |
|
Just the other day I used it to transform an unordered 60 GB CSV file with links and texts into a 3 GB parquet file that's so fast I can create a projection for the relevant data of each partition in like a minute (which then fits in memory).
It has some minor stability issues so I'm not sure I'd build a full blown application on top of it, but for data transformation tasks it's amazing.