Hacker News new | ask | show | jobs
by cpx86 2363 days ago
> You'll spend more time & money on the OpEx cost with Kafka than picking up the client library for Pulsar.

Could you elaborate why this would be the case?

1 comments

Not the OP, but I think they were exaggerating a bit. In practice, operating kafka is a major PITA, because it means you have to

(1) choose a "flavor" wrapper (confluent seems to be a popular one), because the base project isn't easy to develop against

(2) write your own wrappers of those wrappers, to keep your developers from shooting themselves in the foot with wacky defaults

(3) suffer the immense pain that is authenticating topic write/reads, if that's even possible???

(4) stand up zookeeper... and probably lose some data along the way.

(5) suffer zookeeper outages due to buggy code in kafka/zk (I've experienced lost production data due to unpredictable bugs in kafka/zk, but obviously YMMV).

Based on my naive assessment, the kafka/zookeeper ecosystem is maybe 10x as complicated as the problem it's solving, and that shows up in the OpEx. I personally doubt that Pulsar is that much better, but it might be.

These are also valid. I wrote the reply explaining some of the OpEx here: https://news.ycombinator.com/item?id=21938463
What do you mean by 1 and 2? I'm guessing you're referring to the kafka-clients API? The defaults for producer and consumer conf are quite sensible these days.
I wasn’t around to make those decisions at my company, but I imagine that the “these days” component was the cause? There are a lot of configurations, new ones appear and old ones disappear or change names, etc.

In this churny environment, where you want to keep on latest versions (necessitated by bugs mentioned in), you need abstractions to protect you somewhat from the churn.

Confluent also seems to have a fair amount of churn, so you need wrappers for that, that you can update all at once for your developers.

Sorry, when I say these days, I mean >= Kafka 1.0. Things like auto commit offset in 0.8 days were something like 1 minute, as opposed to 5 seconds onwards, max fetch bytes was set significantly higher etc.

My biggest problems with it were when developers who didn't really understand Kafka started setting properties that had promising names to bad values to "ensure throughput" - let's set max.poll.records to 1 to ensure we always get a record as soon as one is available!

That might be my biggest issue with Kafka - it requires a decent amount of knowledge of Kafka to use it well as a developer. I'm not sure if Pulsar removes that cognitive burden for devs or not, but I'm interested in finding out.

And yeah, the wrappers to remove that burden were written in our company too - but then proved quite limiting for the varying use cases for a Kafka client in our system. sigh