| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by techcode 484 days ago

To me it seems like your underlying assumptions is "1 worker can only work on one message/item at a time", right?

While you could also use Kafka like that - and it might even work for your use case, as long as you configure option (sorry forgot the name) that makes Kafka redistribute shards because particular workers/consumers are too slow.

AFAIK the usual way is for each worker to get more than one message/item at a time, and do the actual item/work in/through separate thread/work pool (or another async mechanism).

Kafka then keeps track of which messages were picked up by each worker/consumer, and how big is the gap between that and committed offset (marked as done).

It gets a bit more tricky if you: - can't afford to process some messages/work again (well at extreme end it might actually be a show stopper for using Kafka) - need to have automatic retry on error/fail, how quickly/slowly you want to retry, how many times to retry...etc. - can you afford to temporarily "lose" some pending (picked up from Kafka but offset not marked as done) items for random things (worker OOMKILLED, solar flare hit network cable ...)

We've actually solved some of these with simply having another (set of) worker(s) that consume same topic with a delay (imagine cron job that runs every 5 minutes). And doing things in case there's no record of task being done, putting it into same topic again for retry ...etc.