|
|
|
|
|
by wenc
751 days ago
|
|
Do you have any details on this? Duckdb over vanilla S3 has latency issues because S3 is optimized for bulk transfers, not random reads. The new AWS S3 Express Zone supports low-latency but there's a cost. Caching Parquet reads from vanilla S3 sounds like a good intermediate solution. Most of the time, Parquet files are Hive-partitioned, so it would only entail caching several smaller Parquet files on-demand and not the entire dataset. |
|