|
|
|
|
|
by ckdarby
2173 days ago
|
|
Brokers in Pulsar can be seamlessly added or removed. Bookies can seamlessly be added. Kafka unless there has been a KIP since I stopped paying attention to Kafka doesn't do this. I remember adding brokers to Kafka and taking advantage of them on existing topics meant repartitioning which if I recall correctly breaks the golden ordering contract that most devs bank on. The data written to the partition will always be in the order for that partition itself. |
|
Kafka brokers can seamlessly be added, too.
> I remember adding brokers to Kafka and taking advantage of them on existing topics meant repartitioning which if I recall correctly breaks the golden ordering contract that most devs bank on.
Adding brokers to Kafka does not require repartitioning. It requires data rebalancing ('migrate' some data to the new brokers), which does not break any ordering contract. I suppose the words sound sufficiently similar that they are easy to be mixed up. :)
(For what it's worth, BookKeeper requires the same data rebalancing process.)
> The data written to the partition will always be in the order for that partition itself.
Yes, for Kafka, all log segments that make up a topic-partition are always stored -- or, when rebalancing, moved -- in unison on the same broker. Or, brokers (plural) when we factor in replication. Kafka's approach has downsides but also upsides: data is always stored in a contiguous manner, and can thus also be read by consumers in a contiguous manner, which is very fast.
In comparison, BookKeeper has segmented storage, too. But here the segments -- called ledgers -- of the same topic-partition are spread across different BK bookies. Also, because of how BookKeeper's protocol works (https://bookkeeper.apache.org/docs/4.10.0/development/protoc...), what bookies store are actually not contiguous 'ledgers', but in fact 'fragments of ledgers' (see link). As mentioned elsewhere in this discussion, one downside of this approach is that BK suffers proverbially from data fragmentation. (Remember Windows 95 disk fragmentation? Quite similar.)
No approach is universally better than the other one. As often, design decisions were made to achieve different trade-offs.