|
|
|
|
|
by majidazimi
2167 days ago
|
|
There are real differences among them. Here is some painful aspects of Kafka: 1. A single partition is stored in one node (replicas on another nodes). With this, introducing new nodes takes very long time to replicate large partitions, because it can replicate one partition from only one node (leader of the partition). On Pulsar each segment of partition is stored in a different bookkeeper node. 2. Because of 1, if two consumers read different parts of a partition that are far from each other, they will compete over disk bandwidth. In Kafka consumer can not read from replica node. If a topic is really popular and many consumers try to read from it (from different parts of the file which makes OS page cache useless), total consumption rate is limited to disk bandwidth of a single node. But in Pulsar each consumer can read from different brokers. Catch up consumers won't trash streaming consumers in Pulsar. These are not problems that can be fixed easily. Additionally, in the realm of streaming the difference between Flink and Spark is day and night. The low watermark feature that Flink offers makes them behave fundamentally different. |
|
2. Kafka can read from a replica node. It's relatively new but it's there.