Hacker News new | ask | show | jobs
by ckdarby 2170 days ago
Maybe we're both mixed up.

Let's say we have a cluster with a sort of "global" topic called "incoming-events" and it has 10 partitions.

We'll most likely eventually end up with a "hot" write partition because rarely do I see a perfect even distribution in event streams.

I'd like to seamlessly add capacity to remove this hot spot.

With Pulsar you're using BK which means just spinning up more bookies which will take on new segments and then the rebalancement will move some segments off.

With Kafka I don't know what the option is aside from spinning up a larger and larger broker & rebalancing to the larger box. What I typically see in companies are they repartition from 10 to 20 to avoid expensive one off boxes.

Nobody likes non-uniform resources because it is a nightmare to manage. Imagine a k8s deployment where you have replica:10 but have to handle a custom edge case for resource allocation on one pod different than the other 9 brokers.

(Just assumed 10 partitions to 10 pods)

1 comments

Both Kafka and Pulsar have this kind of bottleneck in your scenario -- say, one "hot" write partition.

If one Kafka broker or BookKeeper bookie node cannot keep up with the write load (e.g. network or disk too slow, CPU util too high), you must add more partitions. For Kafka, for the reasons you already mentioned. For BookKeeper (and Pulsar), because only a single ledger of a topic-partition is open for writes at any given time.