Hacker News new | ask | show | jobs
by hutrdvnj 1818 days ago
> Gone are the days, when one could easily ssh into remote prod systems and fix the issue. With shell less Docker...

I think we finally realized that fixing something in prod via ssh is not a good solution and might introduce new bugs on its own. Rather build an infrastructure that allows you to rollback fast. It is also not worth to fix individual machines in a big cluster, just throw them away and bootstrap them from zero. This way you make sure that you don't have accumulated patches and workarounds on your nodes that might lead to future failures. In some companies we reached the point where you don't even fix a cluster in a multi-cluster setup, but throw away the whole damn thing and bootstrap it from zero.