Hacker News new | ask | show | jobs
by colin_mccabe 1903 days ago
I personally think Kafka has the edge in many ways. It will soon be possible to run a single-process Kafka cluster, which will unlock a lot of applications that previously people used an older systems for, simply because it was easier than standing up a full ZK cluster + Kafka cluster. The broader Kafka ecosystem has features like exactly-once support, KSQL, Kafka Connect, Cluster Linking, and excellent client support that are very valuable.

The Kafka community is huge and the velocity of development is very high. It's easy to forget now, but in the beginning, Kafka didn't even have replication. That's a good reminder that things that seem like permanent advantages of system X over Kafka (for various values of X) may very well prove to be temporary. For example, in this very thread, I see people talking about how various system X'es have the advantage over Kafka because they can run without ZK. Those discussions are almost out of date.

Finally, I work at Confluent and I think the company has always been a positive force in the open source community. I respect the Pulsar people as well, but I think they have a difficult challenge to overcome.

2 comments

What challenges do you see Pulsar having (and potentially not overcoming)?
Pulsar is fighting a massive up-hill battle. Kafka is everywhere (and for good reason).
> The broader Kafka ecosystem has features like exactly-once support

No. No it doesn’t. It has at-least-once delivery with client-side deduplication. That’s not new, it’s what TCP does FFS. Why would you lie to people about supporting something long established at best and demonstrably impossible at worst?

> Finally, I work at Confluent....

Oh, that’s why. Never mind then. Continue selling digital snake oil.

This is one of the differences between Kafka (Confluent) and Pulsar.

Confluent make big bold claims "Exactly once delivery" and have aggressive marketing.

Pulsar on the other hand would say we have "effectivley-once". Reading Pulsar docs vs Kafka, Pulsar are very modest about functionality and have no commercial marketing at all.

These days I have noticed Confluent in blog posts do use effectively once but marketing is as aggressive as ever.

Credit where credit is due. Confluent, the marketing and big bold claims is why almost everyone is using Kafka and not Pulsar and may not of even heard of Pulsar. I do find Pulsar architecture more interesting, since Splunk has brought them though it's remained in the background like it always has with no huge push to sell it.

> No. No it doesn’t. It has at-least-once delivery with client-side deduplication. That’s not new, it’s what TCP does FFS.

It's not just deduplication, you can atomically commit a consumer from one topic + produce of records resulting from that. Which is exactly the same exactly-once guarantee that you get from e.g. an SQL database in linearizable mode (a lot of SQL databases will do the same thing internally - optimistically execute transactions and then re-run them in the case of a conflict).