Hacker News new | ask | show | jobs
by morelisp 1610 days ago
> You mean pre-commit to the maximum parallelism by setting the number of partitions?

Yes. Then depending on the data it can be difficult-to-impossible to scale past that. Scaling down, of course you can always start fewer consumers, but unless you have many more partitions than consumers or partitions as a multiple of consumers, the load will be unbalanced.

> But you get so much more out of kafka with consumer groups and partitions when you do complex message processing

Well, what's "complex message processing"? If you mean the stream topology is complex or the operations benefit from key sharding I agree. If you mean that some atomic operation is complex, no, it's irrelevant or even bad e.g. the complexity means you need a DLQ/OOO retries, or load per item is so unpredictable you want to work-steal.

1 comments

How difficult it is to scale depends on your requirements, but if those aren't strict it's as easy as adjusting a parameter. In my experience it's not hard at all.

If you can live with not strictly ordered messages around scale time you just rescale. If you can live with some latency you stop producing, wait until lag is zero, scale, and then start producing again.

Plus if you pick a partition number with nice divisors like 6, 12, 20, 24, 40 or 60 you can have balanced consumption with different number of consumers.

I really like your number of partitions with nice divisors, I thought of it myself and enthusiastically raved about it to an architect (which I really respect for a lot of reasons) but he insisted on some formula about throughput and I dropped the idea. Truth be told, I work for a bank with fixed number of servers.