Kafka brokers handle connections to consumers and data storage. This creates contention as the primaries for each partition have to service the traffic and handle IO. Consumers that aren't tailing the stream will cause slowdowns because Kafka has to seek to that offset from files which aren't cached in RAM.
Pulsar separates storage into a different layer (powered by Apache Bookkeeper) which allows consumers to read directly from multiple nodes. There's much more IO throughput available to handle consumers picking up anywhere in the stream.
> However, we can’t use Kafka as a queuing system because the maximum number of workers you can have is limited by the number of partitions, and because there is no way to acknowledge messages at the individual message level. You would need to manually commit offsets in Kafka by maintaining a record of individual message acknowledgments in your own data store, which adds a lot of extra overhead — too much overhead in my opinion.
I just want to clarify this - you're limited to N concurrent consumers for N partitions per consumer group.
I am confused by this. The format of Kafka's log files is designed to allow reading and sending to clients directly using sendfile, in sequential reads of batches of messages. http://kafka.apache.org/documentation/#maximizingefficiency