Hacker News new | ask | show | jobs
by miguno 2161 days ago
> The broker being stateless is great. I've had issues with moving partitions around in Kafka.

Well, the Pulsar broker is (kinda) stateless, because they are essentially a caching layer in front of BookKeeper. But where's your data actually stored then? In BookKeeper bookies, which are stateful. Killing and replacing/restarting a Bookkeeper node requires the same redistribution of data as required in Kafka’s case. (Additionally, BookKeeper needs a separate data recovery daemon to be run and operated, https://bookkeeper.apache.org/archives/docs/r4.4.0/bookieRec...)

So the comparison of 'Pulsar broker' vs. 'Kafka broker' is very misleading because, despite identical names, the respective brokers provide very different functionality. It's an apples-to-oranges comparison, like if you'd compare memcached (Pulsar broker) vs. Postgres (Kafka broker).

1 comments

Brokers in Pulsar can be seamlessly added or removed.

Bookies can seamlessly be added.

Kafka unless there has been a KIP since I stopped paying attention to Kafka doesn't do this.

I remember adding brokers to Kafka and taking advantage of them on existing topics meant repartitioning which if I recall correctly breaks the golden ordering contract that most devs bank on. The data written to the partition will always be in the order for that partition itself.

> Bookies can seamlessly be added.

Kafka brokers can seamlessly be added, too.

> I remember adding brokers to Kafka and taking advantage of them on existing topics meant repartitioning which if I recall correctly breaks the golden ordering contract that most devs bank on.

Adding brokers to Kafka does not require repartitioning. It requires data rebalancing ('migrate' some data to the new brokers), which does not break any ordering contract. I suppose the words sound sufficiently similar that they are easy to be mixed up. :)

(For what it's worth, BookKeeper requires the same data rebalancing process.)

> The data written to the partition will always be in the order for that partition itself.

Yes, for Kafka, all log segments that make up a topic-partition are always stored -- or, when rebalancing, moved -- in unison on the same broker. Or, brokers (plural) when we factor in replication. Kafka's approach has downsides but also upsides: data is always stored in a contiguous manner, and can thus also be read by consumers in a contiguous manner, which is very fast.

In comparison, BookKeeper has segmented storage, too. But here the segments -- called ledgers -- of the same topic-partition are spread across different BK bookies. Also, because of how BookKeeper's protocol works (https://bookkeeper.apache.org/docs/4.10.0/development/protoc...), what bookies store are actually not contiguous 'ledgers', but in fact 'fragments of ledgers' (see link). As mentioned elsewhere in this discussion, one downside of this approach is that BK suffers proverbially from data fragmentation. (Remember Windows 95 disk fragmentation? Quite similar.)

No approach is universally better than the other one. As often, design decisions were made to achieve different trade-offs.

Maybe we're both mixed up.

Let's say we have a cluster with a sort of "global" topic called "incoming-events" and it has 10 partitions.

We'll most likely eventually end up with a "hot" write partition because rarely do I see a perfect even distribution in event streams.

I'd like to seamlessly add capacity to remove this hot spot.

With Pulsar you're using BK which means just spinning up more bookies which will take on new segments and then the rebalancement will move some segments off.

With Kafka I don't know what the option is aside from spinning up a larger and larger broker & rebalancing to the larger box. What I typically see in companies are they repartition from 10 to 20 to avoid expensive one off boxes.

Nobody likes non-uniform resources because it is a nightmare to manage. Imagine a k8s deployment where you have replica:10 but have to handle a custom edge case for resource allocation on one pod different than the other 9 brokers.

(Just assumed 10 partitions to 10 pods)

Both Kafka and Pulsar have this kind of bottleneck in your scenario -- say, one "hot" write partition.

If one Kafka broker or BookKeeper bookie node cannot keep up with the write load (e.g. network or disk too slow, CPU util too high), you must add more partitions. For Kafka, for the reasons you already mentioned. For BookKeeper (and Pulsar), because only a single ledger of a topic-partition is open for writes at any given time.

> Kafka brokers can seamlessly be added, too.

You can add as man as you want, but they take no traffic.

How do bookkeeper nodes handle addition then. AFAIK, if you add new storage nodes, whether bookkeeper or Kafka, there has to be some repartitioning, or else how will the new node be useful at all?
See my previous answer further up in this sub-thread. Neither Kafka nor BookKeeper require data repartitioning when adding new nodes. Instead, both require data rebalancing, which moves some data from existing nodes to the newly added nodes.

Think: repartitioning changes the logical layout of the data, which can impact app semantics depending on your application; whereas data (re)balancing just shuffles around stored bytes "as is" behind the scenes, without changing the data itself. The confusion stems probably from the two words sounding very similar.

For Kafka, you use tools like Confluent's Auto Data Balancer (https://docs.confluent.io/current/kafka/rebalancer/index.htm...) or LinkedIn's Cruise Control (https://github.com/linkedin/cruise-control) that automatically rebalance the data in your Kafka cluster in the background. Pulsar has its own toolset to achieve the same.

https://jack-vanlightly.com/blog/2018/10/2/understanding-how... goes into this.

> The data of a given topic is spread across multiple Bookies. The topic has been split into Ledgers and the Ledgers into Fragments and with striping, into calculatable subsets of fragment ensembles. When you need to grow your cluster, just add more Bookies and they’ll start getting written to when new fragments are created. No more Kafka-style rebalancing required. However, reads and writes now have to jump around a bit between Bookies.

If consumers are keeping up, there will be no reads to the BookKeeper layer as the Pulsar broker will serve from memory.

When reads need to go to BookKeeper there are caches there too, with read-aheads to populate the cache to avoid going back to disk regularly.

Even when having to go to disk, there are further optimizations in how data is laid out on disk to ensure as much sequential reading as possible.

Also note that the fragments aren't necessarily that small either.

Ok so it will sacrifice some throughout when a new node is added, as reads and writes need to jump a bit.