Hacker News new | ask | show | jobs
by SkyRocknRoll 2356 days ago
Most of the flaws of Kafka are carefully studied and fixed in Apache pulsar. I have written a blog about why we went ahead with pulsar https://medium.com/@yuvarajl/why-nutanix-beam-went-ahead-wit...
2 comments

> when consumers are lagging behind, producer throughput falls off a cliff because lagging consumers introduce random reads

I am confused by this. The format of Kafka's log files is designed to allow reading and sending to clients directly using sendfile, in sequential reads of batches of messages. http://kafka.apache.org/documentation/#maximizingefficiency

Kafka brokers handle connections to consumers and data storage. This creates contention as the primaries for each partition have to service the traffic and handle IO. Consumers that aren't tailing the stream will cause slowdowns because Kafka has to seek to that offset from files which aren't cached in RAM.

Pulsar separates storage into a different layer (powered by Apache Bookkeeper) which allows consumers to read directly from multiple nodes. There's much more IO throughput available to handle consumers picking up anywhere in the stream.

Kafka works best when the data it is returning to consumers is in the page cache.

When consumers fall behind, they start to request data that might not be in the page cache, causing things to slow down.

> However, we can’t use Kafka as a queuing system because the maximum number of workers you can have is limited by the number of partitions, and because there is no way to acknowledge messages at the individual message level. You would need to manually commit offsets in Kafka by maintaining a record of individual message acknowledgments in your own data store, which adds a lot of extra overhead — too much overhead in my opinion.

I just want to clarify this - you're limited to N concurrent consumers for N partitions per consumer group.