Hacker News new | ask | show | jobs
by wenc 960 days ago
You definitely pay a performance penalty on S3 (S3 is high throughput but high latency storage) so not optimal for (any) database use cases. Local disk will always be faster if you can swing that.

It’s not a DuckDB specific issue (although there’s headroom for improvement — I don’t think DuckDB’s S3 connector is highly optimized). It’s S3.

1 comments

I might’ve been unclear, so to clarify:

The overhead of fetching from S3 via a naive Go implementation (goroutine per object) to disk and then running duckdb on that was lower than using duckdb end-to-end.

I was measuring the S3 overhead in both cases.

No I got you.

Like I said S3 is a high throughput high latency storage. When you fetch the S3 object to disk, that’s a high throughput operation and S3 excels at that. Once on disk DuckDB can operate at low latency.

If you run DuckDB end to end as a database engine on S3, it has to do partial reads on parquet on S3 etc. and has to deal with S3 latencies and it can end up being slower than what you described above.

For long running operations where I can chunk the data, I often copy chunks to local disk before running DuckDB. It’s a lot faster than running DuckDB directly on S3.

The downside is I need enough disk space.

Theoretically reading directly from s3 should be faster. Downloading all the data from s3 and then running the query locally is basically an extreme form of pre-fetching. DuckDB could be written to pre-fetch data concurrently using some heuristics and provide similar or better performance.
Makes sense — every S3 call is high latency so the fewer you make the better.

Prefetching would help reduce the number of those high latency calls which databases naturally make.

We often think of S3 as a file system but it isn’t one — it differs in fundamental ways from one. (Also treating it as a filesystem isn’t performant at all — I tried s3fs and mountpoint but both were slow)

Oh, yeah, that does make sense, and I suppose it might also be a better approach if the objects were large (especially if you can avoid reading some parts). In that case prefetching could be wasteful (or result in OOMs).

I didn’t think of that - thanks for the explanation!