Hacker News new | ask | show | jobs
by vitalus 2463 days ago
Definitely can sympathize with you on this, having spent plenty of time myself fighting some clusters that ended up in a broken state, and trying to get them going again.

I think that this pain is sometimes more severe in the context of automated provisioning tools out there and the trend towards immutable infrastructure - folks tend to not have the know-how to dig in and mutate that state if need be.

It's really important to have a story within teams, though, about either investing in the knowledge needed to make these fixes, or to have the tooling in place to quickly rebuild everything from scratch and cutover to a new, working production cluster in a minimal amount of time.

3 comments

I'm just beginning my journey into the vanilla Kubernetes world.

As I build my knowledge I am also building Ansible playbooks and task files. After each iteration I shutdown my cluster. Do an automated rebuild and test. Delete the original cluster and start my next iteration.

I have an admin box with everything I need to persist between builds (Ansible, keys, configuration files, etc) and can deploy whatever size and quantity of workers (VM) needed.

It has been a good process so far. I haven't yet put things in an unrecoverable state, but if that happens I can rebuild the cluster to my most recent save and try again.

I don't see it taking a lot of resources to have a proving ground. I would definitely not feel comfortable going to production without the ability to reproduce the production clusters' exact state.

I anticipate exactly what you describe as a roll back mechanism. At all times I want to be able to automate the deployment of clusters to an exact known state.

I think building a cluster, walking away from it for a year, and then coming back to it for a break fix/update/new deployment is a huge gamble.

Hi I'm not sure if you saw my comment below, but this is 100% the usecase Sugarkube [1] was designed for. Depending on where you are in setting things up it might save you time to give it a try. There are some non-trivial tutorials [2] and sample projects you can use to kickstart your development. It does only currently work with EKS, Kops and Minikube though so wouldn't be suitable if you're using something else to create your K8s cluster.

[1] https://www.sugarkube.io/

[2] https://docs.sugarkube.io/getting-started/tutorials/local-we...

This is my thinking too. Build a new cluster and push it all over to a new cluster. If you feel like understanding the old (and can afford it), keep it around and try to figure it out.

Clusters are cattle, not pets.

Very much agree, but never managed to reach this point. One reason is that the amount of hardware needed for this is pretty prohibitive. Second is that configuring a new cluster (last time I did it) was so much work, and I never managed to automate the process, that there was just simply no way I could have created a new cluster in time to get our websites back up
I've heard estimates along the lines of two person years of work, and $500 000.