Hacker News new | ask | show | jobs
by mattboyle 2363 days ago
We tried to adopt this but found the documentation very lacking and a severe lack of quality client libraries for our language of choice (go).the "official" one had race conditions in the code as well as "todo" for key pieces littered throughout. There is another from comcast which is abandoned. We had a serious discussion about picking up ownership of the library or writing our own but as a small start up we didnt feel we could do it and still develop the product. I'll continue to keep an eye on pulsar but for now Kafka is the clear go to imo. It's well documented, great SAS offerings (confluent) and tons of books and training courses for it.
4 comments

We're close to release a new "officially supported" native Go client library: https://github.com/apache/pulsar-client-go
We provide a SaaS offering of Apache Pulsar in AWS, Azure, and GCP: https://kafkaesque.io/
Cool name. That's one of those company names that almost seems like someone thought it would make a good company name first and thought it was so fitting, they should build a company around it.
Thanks!
I didnt find this when looking, thanks will take a deeper look.
> found the documentation very lacking

Really? It is one of the few open source projects that we've felt has had modern documentation. How long ago was this?

> As a small startup

You'll spend more time & money on the OpEx cost with Kafka than picking up the client library for Pulsar.

It was about 6 months ago.

I completely disagree with the opex of picking up kafka vs developing a whole client library. Please could you try and explain how you came to this conclusion?

> Please could you try and explain how you came to this conclusion?

1. Stateless brokers

With Kafka any time a broker goes down you need to be aware of the kafka broker id. Yes, this can be fixed by creating your entire infrastructure as code and keeping track of state.

This is something of great OpEx. I've seen few people successfully automate this, Netflix is one of the few. The rest just use manual process with tooling to get around, pager, Kafka tooling to spawn replacement node with the looked up broker id, etc.

2. Kafka MirrorMaker

Granted I have not used v2 that recently came out in ~2.6 but dear gosh v1 was so bad that Uber wrote their own replacement from the ground up called uReplicator. The amount of time wasted on replication broken across regions is disgusting.

3. Optimization & Scaling

Kafka bundles compute & storage. There's (maybe on a upcoming KIP) no way that I know of splitting this. This means you'll waste time on Ops side deciding on tradeoffs between your broker throughput and your broker space.

Worse yet time & money will be wasted here. I'd just rather hire more people than waste time on silly things like this. This is where I justify taking on the expense of client libs.

4. Segments vs Partitions

The major time wasters are where you end up in a situation with the cluster utterly getting destroyed. It will happen, it isn't a question of if but a question of when or the company goes belly up and nobody cares.

It's 3 AM, the producer is getting back pressure, you get a page and now have to deal with adding on write capacity to avoid a hot spot. Don't forget you can't just simply do a rebalancement in Kafka or you'll break the contract with every developer who has developed under the golden rule of, "Your partition order will always be the same".

You'll successfully pay the cost of upgrading the entire cluster and then spending 3 days coming up with a solution to rebalance without making all your devs riot against you when you break that golden contract.

RIP Kafka

Having spent a couple of years dealing with Kafka I'm sorry to burst people's bubbles but is dead. Even Confluent doesn't have a good enough story these days to not switch to Pulsar, they're going to sell you on the same consulting bs, "We're more mature", "We've got better tooling.", "Better suppott"...

Yes, of course, it has been in the open source community 5 years longer and the company has been also around longer for that time. Kafka is dead, long live Pulsar.

I think what is dead is confluent cloud b/c Amazon MSK and Azure HDInsight will be close to feature parity at much less cost.
Damn, I got lazy on my reply & just hoped nobody went further, but well played on digging deeper.

5. Kafka is silly expensive

Pulsar supports message ack with subscription groups. The worst case with Pulsar is you're storing the entire retention period.

Let's say you have a 4 day retention window, to cover an outage happening on Friday and not having to deal with it until Monday. This is pretty typical with what I see in the Kafka world for small-mid size companies who don't want to pay the 1.5x OT on call.

So, with Pulsar you're at worst storing the 4 days of data but at best you're only storing the messages within the lag period of all consumer groups acknowledging the message.

Now, without getting too deep into Pulsar's feature set even that is a lie because Pulsar has tiered storage as a first class citizen. The messages after the four days could be ship off to S3 if we wanted or even within 1 day depending on our use case and this is all built into Pulsar, no OpEx tooling required. Even access the messages from S3 through Pulsar is abstracted, there's no tooling required to pull them back in if you wanted.

Now with Kakfa our worst case is simply 4 days of retention data. This can get very expensive as compute & storage are tied together, it means scaling up all the brokers (even though we don't need the throughput) for the storage increase. Now, yes MSK basically abstracts all this from you but you're paying for it.

6. AWS Managed Service are not equal citizens to EC2 standalone

Managed services right now don't fall under the new Saving Plan: https://aws.amazon.com/blogs/aws/new-savings-plans-for-aws-c...

This will cost you 30-60% discount on your entire Kafka bill.

7. Excel Life

If I look at the numbers for what I'm doing it would have costed ~$4M for Kafka vs ~$1M for Pulsar.

While bare metal Kafka does really bundle itself with lots of OpEx trouble, have you ever tried using an orchestrator to manage it ?

DC/OS implementation easily shuns out 1. and 2.

3. and 4. are valid points, but I think in a real life these scenarios are usually related to cloud service cost optimization, and I would never recommend anyone running Kafka in a cloud due to these reasons.

There one more reason, which was not cited, but poses itself a real killer for cloud Kafka dream AFM: clouds, being prone to all kinds of network interruptions, are not well suited for running Zookeeper ensembles with decent uptime.

Disclaimer: I have never tried or used Apache Pulsar, and just examining its documentation after spotting this thread.

> "using an orchestrator to manage it"

These can be just as fragile and now you have to learn how to manage the orchestrator. Even Confluent's own Kubernetes operator has issues. There's just too many issues with Kafka's design that hinders easy operations.

> "I would never recommend anyone running Kafka in a cloud"

That's a major problem considering that's where most computing is heading. At this point, running in noisey overloaded cloud environments is a good test of the reliability and durability of a software system. Kafka fails massively here.

I recently did a talk covering a lot of what I wrote: https://www.youtube.com/watch?v=jLruEmh3ve0
> You'll spend more time & money on the OpEx cost with Kafka than picking up the client library for Pulsar.

Could you elaborate why this would be the case?

Not the OP, but I think they were exaggerating a bit. In practice, operating kafka is a major PITA, because it means you have to

(1) choose a "flavor" wrapper (confluent seems to be a popular one), because the base project isn't easy to develop against

(2) write your own wrappers of those wrappers, to keep your developers from shooting themselves in the foot with wacky defaults

(3) suffer the immense pain that is authenticating topic write/reads, if that's even possible???

(4) stand up zookeeper... and probably lose some data along the way.

(5) suffer zookeeper outages due to buggy code in kafka/zk (I've experienced lost production data due to unpredictable bugs in kafka/zk, but obviously YMMV).

Based on my naive assessment, the kafka/zookeeper ecosystem is maybe 10x as complicated as the problem it's solving, and that shows up in the OpEx. I personally doubt that Pulsar is that much better, but it might be.

These are also valid. I wrote the reply explaining some of the OpEx here: https://news.ycombinator.com/item?id=21938463
What do you mean by 1 and 2? I'm guessing you're referring to the kafka-clients API? The defaults for producer and consumer conf are quite sensible these days.
I wasn’t around to make those decisions at my company, but I imagine that the “these days” component was the cause? There are a lot of configurations, new ones appear and old ones disappear or change names, etc.

In this churny environment, where you want to keep on latest versions (necessitated by bugs mentioned in), you need abstractions to protect you somewhat from the churn.

Confluent also seems to have a fair amount of churn, so you need wrappers for that, that you can update all at once for your developers.

Sorry, when I say these days, I mean >= Kafka 1.0. Things like auto commit offset in 0.8 days were something like 1 minute, as opposed to 5 seconds onwards, max fetch bytes was set significantly higher etc.

My biggest problems with it were when developers who didn't really understand Kafka started setting properties that had promising names to bad values to "ensure throughput" - let's set max.poll.records to 1 to ensure we always get a record as soon as one is available!

That might be my biggest issue with Kafka - it requires a decent amount of knowledge of Kafka to use it well as a developer. I'm not sure if Pulsar removes that cognitive burden for devs or not, but I'm interested in finding out.

And yeah, the wrappers to remove that burden were written in our company too - but then proved quite limiting for the varying use cases for a Kafka client in our system. sigh

If you're a Go shop, Gazette is worth a look (https://gazette.dev).