| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Xorlev 3518 days ago

One of the biggest problems with treating Kafka as a job queue is that you suffer from head-of-line blocking. Kafka doesn't expose per-message visibility/acknowledgement semantics like RabbitMQ/Redis PUSH+POP/SQS does. Each consumer group tracks offsets into the partitions of a log (aka a topic). This offset is just a number that points to a specific message in the Kafka partition. If you get stuck on message 123, you either can't proceed to 124, proceed and don't commit your offset but risk replaying 124, or skip 123.

A great many of our services publish to Kafka, those consuming services which seek to treat individual records as tasks (or bundles of tasks) as opposed to a linear log must either skip failures or push them onto SQS for background retry. Our batching consumers have to track out-of-order completion of work and commit up to the lowest completed offset, meaning a slow task can delay offset commits. If a consumer is stopped before finishing that slow task, we have to replay work which means all work has to be idempotent. In practice, it works well enough, but it's still some gymnastics.

I suspect this is why Google invested so much into making PubSub scalable despite per-message semantics. It's considerably simpler in many ways, even if you have to bake in your own ordering/monotonicly increasing identifiers.

1 comments

joohwan 3518 days ago

Very true. I indeed found the lack of visibility into per-message information very painful when I was building this. One way I tried to alleviate the issue was providing a consumer "callback" to make it easier for users to plug their own code in to handle job failures (like your example of using SQS).

I've also thought about reserving a topic + consumer group specifically for failed jobs and bake the retry logic into KQ itself. But that's an area I must explore more.

I'm not sure if I understand what you are saying about batching consumers. What do you mean by batching in this context? Thanks for your input.

Xorlev 3518 days ago

We have some consumers which treat log entries as tasks, and often it's handy to debounce some of the work into larger chunks that can be executed in parallel. The chunks can be linear or they could be grouped by some property of the message (e.g. account id). In that case, we have batches of messages with multiple non-consecutive offsets, e.x. [123, 145, 155], [122, 124, 144]. In practice, that means inserting each message offset into a per-partition sorted set of pending work. When a batch completes, all the offsets in that batch are marked as "complete" and we commit the lowest safe offset. Using the example above, if the batch [122, 124, 144] completed, we'd still have [123, 145, 155] outstanding which means the lowest safe offset is 122* even though 124 and 144 also completed in batch 1. Until that second batch completes, 123 is still outstanding making it the barrier to commiting a higher offset.

Our batching consumers provide pluggable behavior for handling a failing batch, but usually it's pushed onto SQS since those can cycle around a few times until we notice and fix whatever condition is preventing progress on that work.

* - 123 actually, as if you commit offset 123 the consumer will fetch offset 123 again on start, but that's implementation esoterica