Hacker News new | ask | show | jobs
by haolez 1910 days ago
Kafka is a pretty cool technology, but for every project that I work on, it's never used because it feels like it's overkill (costly and operation heavy). Maybe I should start looking for bigger projects :D
8 comments

Part of the reason we are removing Kafka's ZooKeeper dependency is to get rid of that "heaviness."

Going forward, you will no longer need to configure and run a separate ZooKeeper service just to run Kafka. For proof-of-concept projects, a single-process Docker image will be available when running in KRaft mode (non-ZK mode).

For bigger projects, you may want to use a managed cloud service. Or if you do choose to manage it yourself, it will be easier running one service than two.

Disclosure: I work for Confluent.

Oh it most certainly simplifies things. I am looking at half the number of boxes needed to run. Which is not insignificant in my cost structure.

What is the migration strategy here? Is it doc'd up yet? I am having flashbacks to migration for follower partitions recently which required a decent amount of pre planning of partition layout.

Also as it is pulling in the duties of ZK into kafka what sort of CPU/memory changes are you seeing? Is it 'meh' or all the way to 'you may want to add a couple of CPUs and a few more GB'? Also is it working ok with the stretched cluster?

Also if you want to hit an interesting market you may want to look at 'does it run OK on a raspberry PI'.

A nit regarding the disclosure: I prefer it at the top of the message, and I think that's "best practice", but I don't know for sure.
Your clarification made me wonder:

Is the single process deployment only doable via a container? Or will we actually have OS native process as well?

Yes, you can run a single OS native process in KRaft mode, without using Docker. Docker just avoids the need to install a JVM, but it is not required.
You can tune Kafka down fairly well if you know what you're doing, but it's not optimised for that OOTB. Or just use Confluent Cloud, which is fully managed and scales down as low as you want (costs cents per Gb). Disclosure: work for Confluent.
This is great advice IMO, let someone else manage your Kafka at scale. I feel compelled to mention that other Apache Kafka managed services are available, but agree that it makes sense to offload the complexity if possible! Disclosure: work at Aiven, who offer managed Apache Kakfa on whatever cloud you are using.
Thank you for disclosing and not disclaiming.
Why would someone choose Confluent Cloud over the Kafka offerings of Azure/AWS/GCP?
Confluent Cloud is a truly 'fully managed' service, with a serverless-like experience for Kafka. For example, you have zero infra to deploy, upgrade, or manage. The Kafka service scales in and out automatically during live operations, you have infinite storage if you want to (via transparent tiered storage), etc. As the user, you just create topics and then read/write your data. Similar to a service like AWS S3, pricing is pay-as-you-go, including the ability to scale to zero.

Kafka cloud offerings like AWS MSK are quite different, as you still have to do much of the Kafka management yourself. It's not a fully managed service. This is also reflected in the pricing model, as you pay per instance-hours (= infra), not by usage (= data). Compare to AWS S3—you don't pay for instance-hours of S3 storage servers here, nor do you have to upgrade or scale in/out your S3 servers (you don't even see 'servers' as an S3 user, just like you don't see Kafka brokers as a Confluent Cloud user).

Secondly, Confluent is available on all three major clouds: AWS, GCP, and Azure. And we also support streaming data across clouds with 'cluster linking'. The other Kafka offerings are "their cloud only".

Thirdly, Confluent includes many additional components of the Kafka ecosystem as (again) fully managed services. This includes e.g. managed connectors, managed schema registry, and managed ksqlDB.

There's a more detailed list at https://www.confluent.io/confluent-cloud/ if you are interested. I am somewhat afraid this comment is coming across as too much marketing already. ;-)

Disclaimer: I work at Confluent.

Confluent Cloud has some nice point-and-click UI for creating associated Kafka resources like Schema Registries and Connect Clusters.

My preference is MSK but I'm very comfortable with vanilla Kafka in AWS at a good price with auto-updates.

One nice thing about confluent cloud vs MSK is the minimum cost of a confluent cloud cluster is far, far cheaper than the minimal cost of an MSK cluster
Is there a GCP offering that isn't just Confluent Cloud billed via Google?
You can use Pub/Sub Lite: https://cloud.google.com/pubsub/lite/docs

With a Kafka compatibility shim: https://github.com/googleapis/java-pubsublite-kafka

Disclaimer: I work on GCP.

You can get managed Kafka on Aiven (disclaimer: I work there) on GCP, either through the marketplace or directly through Aiven.
Haven't used it personally myself but I've heard it enough to remember it. Redpanda[1] aims to be a Kafka replacement without having to worry about Zookeeper or the JVM

[1] https://vectorized.io/

https://vectorized.io/redpanda/ is a more useful link, since the main domain appears to have some JS errors right now.
oh odd. what setup to repro the js errors. i'll fix.
For people who just need a queue, Kafka is a bit like using Kubernetes to run a single Docker container.

We run a number of Kafka clusters, most are relatively low trafic, and the management overhead is pretty. Earlier version did require a bit more attention, but mostly it’s pretty simple to deal with.

This is huge news.

Kafka is awesome, but using it in local envs is a pain in the ass, if this is never becomes PROD ready it is already an immense achievement to be able to run Kafka locally with less complexity and overhead.

I think that's one of the main points. Now you can run it as a single process more like a traditional broker (although it's obviously still a log).
yeah, really needs a use case that justifies it, I have a particular IoT backend where I made it pluggable between kafka and rabbitmq, ended up just using rabbitmq as it is simpler to work with / manage, and still not really pushing it in terms of performance with thousands of devices.
What do you use instead?
Cheap managed cloud services, like AWS SQS and Azure Storage Queue (I usually want some kind of persistence for my queues).
Pub/Sub Lite is a cheap managed cloud service on GCP: https://cloud.google.com/pubsub/lite/docs

With a Kafka compatibility shim: https://github.com/googleapis/java-pubsublite-kafka

Disclaimer: I work for GCP.

Confluent Cloud Basic/Standard is a cheap managed Kafka. If the objection is to the deployment and not Kafka clients.
We might have different definitions of cheap.
GCP offers Pub/Sub Lite, an inexpensive messaging product with Kafka-compatible client libraries.

https://cloud.google.com/pubsub/lite/docs https://github.com/googleapis/java-pubsublite-kafka

Disclaimer: I work on this product.

Nats
What kind if projects so you work om? I am genuinely curious about use cases. Feel free to obfuscate.
E-commerce

In the process of splitting up everything in modules.

( Microservices would be Overkill)