|
|
|
|
|
by fizlebit
760 days ago
|
|
Even if it was operator error some sort of public COE would help others avoid the pitfall by design, e.g. restricting the permissions of terraform so that it can only affect resources for the system and availability zone (or better still cell) under deployment, e.g. you're running a deployment to system X, you shouldn't be able to destroy your backup buckets. Essentially minimizing the blast radius of configuration operation. I guess you'd also want to one-box the terraform change after testing it in preprod ideally though a pipeline with monitoring. "The power to modify is the power to destroy." Finally I wonder if there is a some way say to terraform, don't delete more than x resources and start very slowly, and only delete leaf resources, not the top level resource. At the end of the day terraform can have a bug. You really want to control blast radius with permissions. Makes me wonder if the GCP VMWare integration is a boundary that doesn't expose granular permissions. If it was operator error with terraform that should set off alarm bells through the industry. Who else is one fat finger away from total annihilation. |
|
"Hey terraform just output a wall of text, it wants to know whether or not to proceed."
"That's what it does mate. Let it do its thing, she'll be right."