Hacker News new | ask | show | jobs
by htn 3566 days ago
FWIW, you can get Kafka packaged as a fully managed and HA service from https://aiven.io on AWS and also Azure, GCE and DigitalOcean.

But if the Auth0 runs their entire operations on AWS, maybe Kinesis would have been a more natural transition.

3 comments

Eh, Kinesis has some pretty significant trade-offs to know about if you are comparing it with Kafka (e.g. data retention time and write latency).
We need an on-premise and cloud story, so cloud only solutions did not cut it for us.
The article is a little old. How has the system run since you deployed it? Do you have any interesting figures?
It continues to run beautifully. Since we rolled it out back in 2015 we had zero issues with real time logging. I have particularly fond memories of the first week after rollout, it felt like vacation. I finally could get some sleep.
I'm in a similar boat. I'm hoping to propose Kafka to help with some data replication and consolidation tasks, but it has to be both on-premise and as low maintenance as possible (low maintenance in the sense of the work local developers would do).

To anyone reading this with Kafka experience, do you have any tips/advice when it comes to maintaining a Kafka service?

Use 5 zookeepers, on a separate set of servers.

Use configuration management such as chef to allow you to quickly build new nodes and to roll out changes accross the cluster. You will need to make tweaks. The chef Kafka cookbook which is the top result on Google has means of coordinating restarts of brokers accross the cluster. Use consul as a locking mechanism for this. You could use zookeeper, but consul works well for auto DNS registration and auto discovery.

Use the yahoo Kafka-manager app to manage the cluster and to see what is going on.

Don't use the Kafka default of storing data in /tmp/. Your OS will periodically clean it.

> Don't use the Kafka default of storing data in /tmp/.

That seems like MySQL level of bad defaults.

Five kind of kills performance compared to three, and doesn't map well into AWS, where you generally have 3 or 4 AZ's. I tend to go with three but make sure you've got fully automated responses towards failures.
Five zookeepers? Seems like a lot. Why five? Is it hard to keep them active?

Thanks for the tips.

Zab [the distributed consensus algorithm that powers ZK] shares some similarities with Paxos, and requires a quorum of nodes to be online.

If you want highly available ZK, your choices are 3, 5, 7... nodes, for which you can have 1, 2, or 3 nodes offline at any one time.

If you have one node fully down on a 3 node cluster, and there is even a tiny network blip or partition (as often happens in cloud environments) then you are down.

Kinesis is very poor