Hacker News new | ask | show | jobs
by snake_doc 413 days ago
Are you querying from an EC2 instance close to the S3 data? Are the CSVs partitioned into separate files? Does the machine have 500GB of memory? It’s not always duckdb fault when there can be a clear I/O bottleneck…
1 comments

No, the EC2 instance doesn't have 500GB of data. Does DuckDB require that? I actually downloaded the data from S3 to local EBS and still choked.
Works fine for me on TB+ datasets. Maybe you were doing in-memory rather than persistent database and running out of RAM? https://duckdb.org/docs/stable/clients/cli/overview.html#in-...
Wait, do you insert the data from S3 into duckdb? I was just doing select from file.
Nope, just reading from S3. Check this out: https://duckdb.org/2024/07/09/memory-management.html
Maybe its your terminal that chockes because it tries to display to much data? 500GB should be no problem.