|
|
|
|
|
by tiew9Vii
1044 days ago
|
|
Flushing every 100ms means you would end up with lots of tiny files (bytes) in s3 unless you have something out of process re-writing them in to larger blobs similar to Delta lakes optomize? The lots of tiny files would be really inefficient from throughput and api call perspective in blob storage. With the acks, you have up to 100ms waiting for the buffer to fill, + s3 put request + your metadata request/response. For high throughput that must have very high latency putting back pressure on partitions? |
|