Hacker News new | ask | show | jobs
by tiew9Vii 1044 days ago
Flushing every 100ms means you would end up with lots of tiny files (bytes) in s3 unless you have something out of process re-writing them in to larger blobs similar to Delta lakes optomize?

The lots of tiny files would be really inefficient from throughput and api call perspective in blob storage.

With the acks, you have up to 100ms waiting for the buffer to fill, + s3 put request + your metadata request/response. For high throughput that must have very high latency putting back pressure on partitions?

1 comments

Behind the scenes if they're sinking to S3 using Iceberg it handles compaction via it's maintenance API.