| I'm one of the Kafka authors, so admittedly my view might be slightly biased Here is a quick comparison of Kafka and Pulsar: - Kafka is a complete streaming platform vs a messaging system which is what Pulsar is. Through Kafka Connect (http://www.confluent.io/blog/announcing-kafka-connect-buildi...), it has support for connectors to stream data between various sources and systems. Through Kafka Streams (http://www.confluent.io/blog/introducing-kafka-streams-strea...), it has support to do stream processing and transformations over Kafka topics. - Broad adoption base: Kafka is very widely adopted across thousands of companies worldwide. https://cwiki.apache.org/confluence/display/KAFKA/Powered+By - Tunable durability and consistency knobs on the producer: The Kafka producer API allows the application to either wait until a message is fully committed across all replicas or just the leader. This allows applications to make the right tradeoffs for throughput vs durability. One size does not fit all. - Performance and efficiency: Kafka supports zero-copy consumption allowing the consumers to read large amounts of data at high throughput. To the extent that I understand, Pulsar with its legder-broker model does not support zero-copy consumption. - A lot of the reasons quoted for creating Pulsar are features that exist in Kafka and are used in production: -- Kafka has multi-tenancy support through user-defined quotas (See this http://www.confluent.io/blog/sharing-is-caring-multi-tenancy...) -- Kafka has support for authentication, authorization, user-defined ACLs (See this http://www.confluent.io/blog/apache-kafka-security-authoriza...) -- Kafka has support for geo replication. In fact, that is the most common use case for Kafka in several companies. (See this https://engineering.linkedin.com/kafka/running-kafka-scale) -- Latency: The end-to-end latency from publish to consume can be very low in Kafka (<10ms). - Support for millions of topics: To the extent that I understand, both Pulsar and Kafka use ZooKeeper for metadata management. That is the main bottleneck for supporting a large number of topics and likely the same tradeoffs apply to both Kafka and Pulsar as a result. - Storage model: The length of a partition in BookKeeper and
hence in Pulsar is not bounded by the capacity of a server. So you have the ability to add servers to accommodate a workload spike. This is merely a quick overview. There might be more aspects of this comparison that I'm missing. |