|
|
|
|
|
by dilyevsky
869 days ago
|
|
> During our self-managed time on AWS, we experienced a massive cluster crash that resulted in the majority of our systems and products going down. The Root CA certificate, etcd certificate, and API server certificate expired, which caused the cluster to stop working and prevented our management of it. The support to resolve this, at that time, in kube-aws was limited. We brought in an expert, but in the end, we had to rebuild the entire cluster from scratch. That's crazy, I've personally recovered 1.11-ish kops clusters from this exact fault and it's not that hard when you really understand how it works. Sounds like a case of bad "expert" advice. |
|