| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by haolez 1910 days ago
	Kafka is a pretty cool technology, but for every project that I work on, it's never used because it feels like it's overkill (costly and operation heavy). Maybe I should start looking for bigger projects :D

8 comments

colin_mccabe 1910 days ago

Part of the reason we are removing Kafka's ZooKeeper dependency is to get rid of that "heaviness."

Going forward, you will no longer need to configure and run a separate ZooKeeper service just to run Kafka. For proof-of-concept projects, a single-process Docker image will be available when running in KRaft mode (non-ZK mode).

For bigger projects, you may want to use a managed cloud service. Or if you do choose to manage it yourself, it will be easier running one service than two.

Disclosure: I work for Confluent.

sumtechguy 1910 days ago

Oh it most certainly simplifies things. I am looking at half the number of boxes needed to run. Which is not insignificant in my cost structure.

What is the migration strategy here? Is it doc'd up yet? I am having flashbacks to migration for follower partitions recently which required a decent amount of pre planning of partition layout.

Also as it is pulling in the duties of ZK into kafka what sort of CPU/memory changes are you seeing? Is it 'meh' or all the way to 'you may want to add a couple of CPUs and a few more GB'? Also is it working ok with the stretched cluster?

Also if you want to hit an interesting market you may want to look at 'does it run OK on a raspberry PI'.

jwandborg 1910 days ago

A nit regarding the disclosure: I prefer it at the top of the message, and I think that's "best practice", but I don't know for sure.

pdimitar 1910 days ago

Your clarification made me wonder:

Is the single process deployment only doable via a container? Or will we actually have OS native process as well?

colin_mccabe 1910 days ago

Yes, you can run a single OS native process in KRaft mode, without using Docker. Docker just avoids the need to install a JVM, but it is not required.

math 1910 days ago

You can tune Kafka down fairly well if you know what you're doing, but it's not optimised for that OOTB. Or just use Confluent Cloud, which is fully managed and scales down as low as you want (costs cents per Gb). Disclosure: work for Confluent.

lornajane 1910 days ago

This is great advice IMO, let someone else manage your Kafka at scale. I feel compelled to mention that other Apache Kafka managed services are available, but agree that it makes sense to offload the complexity if possible! Disclosure: work at Aiven, who offer managed Apache Kakfa on whatever cloud you are using.

unixhero 1910 days ago

Thank you for disclosing and not disclaiming.

alex_anglin 1910 days ago

Why would someone choose Confluent Cloud over the Kafka offerings of Azure/AWS/GCP?

miguno 1910 days ago

Confluent Cloud is a truly 'fully managed' service, with a serverless-like experience for Kafka. For example, you have zero infra to deploy, upgrade, or manage. The Kafka service scales in and out automatically during live operations, you have infinite storage if you want to (via transparent tiered storage), etc. As the user, you just create topics and then read/write your data. Similar to a service like AWS S3, pricing is pay-as-you-go, including the ability to scale to zero.

Kafka cloud offerings like AWS MSK are quite different, as you still have to do much of the Kafka management yourself. It's not a fully managed service. This is also reflected in the pricing model, as you pay per instance-hours (= infra), not by usage (= data). Compare to AWS S3—you don't pay for instance-hours of S3 storage servers here, nor do you have to upgrade or scale in/out your S3 servers (you don't even see 'servers' as an S3 user, just like you don't see Kafka brokers as a Confluent Cloud user).

Secondly, Confluent is available on all three major clouds: AWS, GCP, and Azure. And we also support streaming data across clouds with 'cluster linking'. The other Kafka offerings are "their cloud only".

Thirdly, Confluent includes many additional components of the Kafka ecosystem as (again) fully managed services. This includes e.g. managed connectors, managed schema registry, and managed ksqlDB.

There's a more detailed list at https://www.confluent.io/confluent-cloud/ if you are interested. I am somewhat afraid this comment is coming across as too much marketing already. ;-)

Disclaimer: I work at Confluent.

d_t_w 1910 days ago

Confluent Cloud has some nice point-and-click UI for creating associated Kafka resources like Schema Registries and Connect Clusters.

My preference is MSK but I'm very comfortable with vanilla Kafka in AWS at a good price with auto-updates.

tamale 1910 days ago

One nice thing about confluent cloud vs MSK is the minimum cost of a confluent cloud cluster is far, far cheaper than the minimal cost of an MSK cluster

dividedbyzero 1910 days ago

Is there a GCP offering that isn't just Confluent Cloud billed via Google?

jganetsk 1909 days ago

You can use Pub/Sub Lite: https://cloud.google.com/pubsub/lite/docs

With a Kafka compatibility shim: https://github.com/googleapis/java-pubsublite-kafka

Disclaimer: I work on GCP.

lornajane 1910 days ago

You can get managed Kafka on Aiven (disclaimer: I work there) on GCP, either through the marketplace or directly through Aiven.

dmlittle 1910 days ago

Haven't used it personally myself but I've heard it enough to remember it. Redpanda[1] aims to be a Kafka replacement without having to worry about Zookeeper or the JVM

[1] https://vectorized.io/

timdorr 1910 days ago

https://vectorized.io/redpanda/ is a more useful link, since the main domain appears to have some JS errors right now.

agallego 1910 days ago

oh odd. what setup to repro the js errors. i'll fix.

mrweasel 1910 days ago

For people who just need a queue, Kafka is a bit like using Kubernetes to run a single Docker container.

We run a number of Kafka clusters, most are relatively low trafic, and the management overhead is pretty. Earlier version did require a bit more attention, but mostly it’s pretty simple to deal with.

cfontes 1910 days ago

This is huge news.

Kafka is awesome, but using it in local envs is a pain in the ass, if this is never becomes PROD ready it is already an immense achievement to be able to run Kafka locally with less complexity and overhead.

toomanybits 1910 days ago

I think that's one of the main points. Now you can run it as a single process more like a traditional broker (although it's obviously still a log).

keithnz 1910 days ago

yeah, really needs a use case that justifies it, I have a particular IoT backend where I made it pluggable between kafka and rabbitmq, ended up just using rabbitmq as it is simpler to work with / manage, and still not really pushing it in terms of performance with thousands of devices.

cmason 1910 days ago

What do you use instead?

haolez 1910 days ago

Cheap managed cloud services, like AWS SQS and Azure Storage Queue (I usually want some kind of persistence for my queues).

jganetsk 1909 days ago

Pub/Sub Lite is a cheap managed cloud service on GCP: https://cloud.google.com/pubsub/lite/docs

With a Kafka compatibility shim: https://github.com/googleapis/java-pubsublite-kafka

Disclaimer: I work for GCP.

gwenshap 1910 days ago

Confluent Cloud Basic/Standard is a cheap managed Kafka. If the objection is to the deployment and not Kafka clients.

KptMarchewa 1910 days ago

We might have different definitions of cheap.

menzella-g 1909 days ago

GCP offers Pub/Sub Lite, an inexpensive messaging product with Kafka-compatible client libraries.

https://cloud.google.com/pubsub/lite/docs https://github.com/googleapis/java-pubsublite-kafka

Disclaimer: I work on this product.

NicoJuicy 1910 days ago

Nats

unixhero 1910 days ago

What kind if projects so you work om? I am genuinely curious about use cases. Feel free to obfuscate.

NicoJuicy 1910 days ago

E-commerce

In the process of splitting up everything in modules.

( Microservices would be Overkill)