|
|
|
|
|
by nomilk
596 days ago
|
|
Where does your data reside, is it on an attached EBS volume, or in S3, or somewhere else? I had some spare time and tinkered with duckdb with a 70GB dataset, but just getting the 70GB on to the EC2 took hours. Would be pretty rocking if duckdb team could somehow set up a ~1TB sized demo that anyone can setup and try for themselves in, say, under an hour. |
|
Reading the data off s3 will mean you will be slower than offerings like snowflake. Snowflake has optimized the crap out of doing analytics in s3, so you can’t beat it with something as simple as duckdb.
Importantly you need the data in some distributed format like parquet or split csv. Otherwise duckdb can’t read it in parallel.