Hacker News new | ask | show | jobs
by techcode 487 days ago
What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.

With that (and sharding based on that ID/value) - all your consumers/workers will get equal amount of messages/tasks.

Both post and seemingly general theme of comments here is trashing choice of Kafka for low volume.

Interestingly both are ignoring other valid reasons/requirements making Kafka perfectly good choice despite low volume - e.g.:

- multiple different consumers/workers consuming same messages at their own pace

- needing to rewind/replay messages

- guarantee that all messages related to specific user (think bank transactions in book example of CQRS) will be handled by one pod/consumer, and in consistent order

- needing to chain async processing

And I'm probably forgetting bunch of other use cases.

And yes, even with good sharding - if you have some tasks/work being small/quick while others being big/long can still lead to non-optimal situations where small/quick is waiting for bigger one to be done.

However - if you have other valid reasons to use Kafka, and it's just this mix of small and big tasks that's making you hesitant... IMHO it's still worth trying Kafka.

Between using bigger buckets (so instead of 1 fetch more items/messages and handle work async/threads/etc), and Kafka automatically redistributing shards/partitions if some workers are slow ... You might be surprised it just works.

And sure - you might need to create more than one topic (e.g. light, medium, heavy) so your light work doesn't need to wait for heavier one.

Finally - I still didn't see anyone mention actual real deal breakers for Kafka.

From the top of my head I recall a big one is no guarantee of item/message being processed only once - even without you manually rewinding/reprocessing it.

It's possible/common to have situations where worker picks up a message from Kafka, processes (wrote/materialized/updated) it and when it's about to commit the kafka offset (effectively mark it as really done) it realizes Kafka already re-partitioned shards and now another pod owns particular partition.

So if you can't model items/messages or the rest of system in a way that can handle such things ... Say with versioning you might be able to just ignore/skip work if you know underlying materialized data/storage already incorporates it, or maybe whole thing is fine with INSERT ON DUPLICATE KEY UPDATE) - then Kafka is probably not the right solution.

2 comments

(Author here)

You say: > What that post describes (all work going to one/few workers) in practice doesn't really happen if you properly randomize (e.g. just use random UUID) ID of the item/task when inserting it into Kafka.

I would love to be wrong about this, but I don't _think_ this changes things. When you have few enough messages, you can still get unlucky and randomly choose the "wrong" partitions. To me, it's a fundamental probability thing - if you roll the dice enough times, it all evens out (high enough message volume), but this article is about what happens when you _don't_ roll the dice enough times.

If it's a fundamental probability thing with randomized partition selection, put the actual probability of what you're describing in the article.

.25^20 is not a "somewhat unlucky sequence of events"

(Author here)

Fair enough. I agree .25^20 is basically infinitesimal, and even with a smaller exponent (like .25^3) the odds are not great, so I appreciate you calling this out.

Flipping this around, though, if you have 4 workers total and 3 are busy with jobs (1 idle), your next job has only a 25% chance of hitting the idle worker. This is what I see the most in practice; there is a backlog, and not all workers are busy even though there is a backlog.

With Kafka you normally don't pick a worker - Kafka does that. IIRC with some sort of consistent hashing - but for simplicity sake lets say it's just modulo 'messageID % numberOfShards'.

You control/configure numberOfShards - and its usually set to something order of magnitude bigger than your expected number of workers (to be precise - that's number of docker pods or hardware boxes/servers) - e.g. 32, 64 or 128.

So in practice - Kafka assigns multiple shards to each of your "workers" (if you have more workers than shards then some workers don't do any work).

And while each of your workers is limited to one thread for consuming Kafka messages. Each worker can still process multiple messages at the same time - in different async/threads.

To me it seems like your underlying assumptions is "1 worker can only work on one message/item at a time", right?

While you could also use Kafka like that - and it might even work for your use case, as long as you configure option (sorry forgot the name) that makes Kafka redistribute shards because particular workers/consumers are too slow.

AFAIK the usual way is for each worker to get more than one message/item at a time, and do the actual item/work in/through separate thread/work pool (or another async mechanism).

Kafka then keeps track of which messages were picked up by each worker/consumer, and how big is the gap between that and committed offset (marked as done).

It gets a bit more tricky if you: - can't afford to process some messages/work again (well at extreme end it might actually be a show stopper for using Kafka) - need to have automatic retry on error/fail, how quickly/slowly you want to retry, how many times to retry...etc. - can you afford to temporarily "lose" some pending (picked up from Kafka but offset not marked as done) items for random things (worker OOMKILLED, solar flare hit network cable ...)

We've actually solved some of these with simply having another (set of) worker(s) that consume same topic with a delay (imagine cron job that runs every 5 minutes). And doing things in case there's no record of task being done, putting it into same topic again for retry ...etc.

The other thing that's PITA with Kafka is fail/retry.

If you want to continue processing other/newer items/messages (and usually you do), you need to commit Kafka topic offset - leaving you to figure out what to do with failed item/message.

One simple thing is just re-inserting it again into the same topic (at the end). If it was temps transient error that could be enough

Instead of same topic, you can also insert it into another failedX Kafka topic (and have topic processed by cron like scheduled task).

And if you need things like progressive backing off before attempting reprocessing - you liekly want to push failed items into something else.

While it could be another tasks system/setup where you can specify how many reprocessing attempts to make, how much time to wait before next attempt ...etc. Often it's enough to have a simple DB/table.