| HN Mirror

We have some consumers which treat log entries as tasks, and often it's handy to debounce some of the work into larger chunks that can be executed in parallel. The chunks can be linear or they could be grouped by some property of the message (e.g. account id). In that case, we have batches of messages with multiple non-consecutive offsets, e.x. [123, 145, 155], [122, 124, 144]. In practice, that means inserting each message offset into a per-partition sorted set of pending work. When a batch completes, all the offsets in that batch are marked as "complete" and we commit the lowest safe offset. Using the example above, if the batch [122, 124, 144] completed, we'd still have [123, 145, 155] outstanding which means the lowest safe offset is 122* even though 124 and 144 also completed in batch 1. Until that second batch completes, 123 is still outstanding making it the barrier to commiting a higher offset.

Our batching consumers provide pluggable behavior for handling a failing batch, but usually it's pushed onto SQS since those can cycle around a few times until we notice and fix whatever condition is preventing progress on that work.

* - 123 actually, as if you commit offset 123 the consumer will fetch offset 123 again on start, but that's implementation esoterica