Hacker News new | ask | show | jobs
by theLiminator 960 days ago
Theoretically reading directly from s3 should be faster. Downloading all the data from s3 and then running the query locally is basically an extreme form of pre-fetching. DuckDB could be written to pre-fetch data concurrently using some heuristics and provide similar or better performance.
1 comments

Makes sense — every S3 call is high latency so the fewer you make the better.

Prefetching would help reduce the number of those high latency calls which databases naturally make.

We often think of S3 as a file system but it isn’t one — it differs in fundamental ways from one. (Also treating it as a filesystem isn’t performant at all — I tried s3fs and mountpoint but both were slow)