Hacker News new | ask | show | jobs
by sciurus 2170 days ago
https://jack-vanlightly.com/blog/2018/10/2/understanding-how... goes into this.

> The data of a given topic is spread across multiple Bookies. The topic has been split into Ledgers and the Ledgers into Fragments and with striping, into calculatable subsets of fragment ensembles. When you need to grow your cluster, just add more Bookies and they’ll start getting written to when new fragments are created. No more Kafka-style rebalancing required. However, reads and writes now have to jump around a bit between Bookies.

2 comments

If consumers are keeping up, there will be no reads to the BookKeeper layer as the Pulsar broker will serve from memory.

When reads need to go to BookKeeper there are caches there too, with read-aheads to populate the cache to avoid going back to disk regularly.

Even when having to go to disk, there are further optimizations in how data is laid out on disk to ensure as much sequential reading as possible.

Also note that the fragments aren't necessarily that small either.

Ok so it will sacrifice some throughout when a new node is added, as reads and writes need to jump a bit.