Hacker News new | ask | show | jobs
by TheHydroImpulse 3307 days ago
We started deploying our Kafka cluster as a set of N EC2 instances but we started running into a bunch of issues (rolling the cluster, rolling an instance without moving partitions around, moving partitions around, etc...)

Now we run Kafka through ECS and wrote some tooling to manage rolling the cluster and replacing brokers. krollout(1) (currently private) basically prevents partitions from becoming unavailable while rolling.

Now that multiple teams are using Kakfa we started exploring how to scale up. Each team may have different requirements and isolation can become an issue. Likely more tooling will need to be built around this.