Hacker News new | ask | show | jobs
by morelisp 1610 days ago
You need to pre-commit to the amount of parallelism you want, and work-stealing or any other out-of-order processing is nearly impossible. Any Kafka client bug tracker is full of people who are confused (or worse) about this.

We use Kafka as a queue because we understand it very well, but it has a lot of limitations compared to "purpose-built" queuing services.

1 comments

You mean pre-commit to the maximum parallelism by setting the number of partitions or am I missing something?

In that case, sure, theoretically with JMS you can spawn as many parallel consumers as you want and with kafka you're limited to the number of partitions you've configured your topic with, as you can only have one partition per consumer.

But you get so much more out of kafka with consumer groups and partitions when you do complex message processing that would be a lot harder with traditional queues (if you partition based on the same key).

> You mean pre-commit to the maximum parallelism by setting the number of partitions?

Yes. Then depending on the data it can be difficult-to-impossible to scale past that. Scaling down, of course you can always start fewer consumers, but unless you have many more partitions than consumers or partitions as a multiple of consumers, the load will be unbalanced.

> But you get so much more out of kafka with consumer groups and partitions when you do complex message processing

Well, what's "complex message processing"? If you mean the stream topology is complex or the operations benefit from key sharding I agree. If you mean that some atomic operation is complex, no, it's irrelevant or even bad e.g. the complexity means you need a DLQ/OOO retries, or load per item is so unpredictable you want to work-steal.

How difficult it is to scale depends on your requirements, but if those aren't strict it's as easy as adjusting a parameter. In my experience it's not hard at all.

If you can live with not strictly ordered messages around scale time you just rescale. If you can live with some latency you stop producing, wait until lag is zero, scale, and then start producing again.

Plus if you pick a partition number with nice divisors like 6, 12, 20, 24, 40 or 60 you can have balanced consumption with different number of consumers.

I really like your number of partitions with nice divisors, I thought of it myself and enthusiastically raved about it to an architect (which I really respect for a lot of reasons) but he insisted on some formula about throughput and I dropped the idea. Truth be told, I work for a bank with fixed number of servers.