|
|
|
|
|
by ramraj07
596 days ago
|
|
Local drives. DONT USE EBS! you’ll incur a huge IO charge. You have to choose instances with attached nvme storage which means one of the storage optimized instances. Reading the data off s3 will mean you will be slower than offerings like snowflake. Snowflake has optimized the crap out of doing analytics in s3, so you can’t beat it with something as simple as duckdb. Importantly you need the data in some distributed format like parquet or split csv. Otherwise duckdb can’t read it in parallel. |
|
On the setup side, I agree that local (instance-attached) disks should be preferred but does EBS incur an IO fee? It incurs a significant latency for sure but it doesn't have a per-operation pricing:
> I/O is included in the price of the volumes, so you pay only for each GB of storage you provision.
(https://aws.amazon.com/ebs/pricing/)