Y
Hacker News
new
|
ask
|
show
|
jobs
by
snake_doc
413 days ago
Are you querying from an EC2 instance close to the S3 data? Are the CSVs partitioned into separate files? Does the machine have 500GB of memory? It’s not always duckdb fault when there can be a clear I/O bottleneck…
1 comments
aynyc
413 days ago
No, the EC2 instance doesn't have 500GB of data. Does DuckDB require that? I actually downloaded the data from S3 to local EBS and still choked.
link
broner
412 days ago
Works fine for me on TB+ datasets. Maybe you were doing in-memory rather than persistent database and running out of RAM?
https://duckdb.org/docs/stable/clients/cli/overview.html#in-...
link
aynyc
412 days ago
Wait, do you insert the data from S3 into duckdb? I was just doing select from file.
link
broner
411 days ago
Nope, just reading from S3. Check this out:
https://duckdb.org/2024/07/09/memory-management.html
link
fastasucan
411 days ago
Maybe its your terminal that chockes because it tries to display to much data? 500GB should be no problem.
link