|
|
|
|
|
by theLiminator
960 days ago
|
|
Theoretically reading directly from s3 should be faster. Downloading all the data from s3 and then running the query locally is basically an extreme form of pre-fetching. DuckDB could be written to pre-fetch data concurrently using some heuristics and provide similar or better performance. |
|
Prefetching would help reduce the number of those high latency calls which databases naturally make.
We often think of S3 as a file system but it isn’t one — it differs in fundamental ways from one. (Also treating it as a filesystem isn’t performant at all — I tried s3fs and mountpoint but both were slow)