|
|
|
|
|
by jsnell
2536 days ago
|
|
Yep, that's the exact bullet point I was writing a response on. Security and abuse are of course special little snowflakes, with configs that need to be pushed very fast, contrary to all best practices for safe deployments of globally distributed systems. An anti-abuse rule that takes three days to roll out might as well not exist. The only way this makes sense is if they mean that there'll be a staged rollout of some sort, but it won't be the same process as for the rest of their software. I.e. for this purpose you need much faster staging just due to the problem domain, but even a 10 minute canary should provide meaningful push safety against this kind of catastrophic meltdown. And the emergency process is something you'll use once every five years. |
|
They want to have a rapid response path (little to no delay using staging envs) to respond to emergencies. The old SOP allowed all releases to use the emergency path. By not using it in the SOP anymore, I'd be concerned that it would break silently from some other refactor or change.
Your notion is to maintain the emergency rollout as a relaxation of the new SOP such that the time in staging is reduced to almost nothing. That sounds like a good idea since it avoids maintaining two processes and having greater risk of breakage. So, same logic but using different thresholds versus two independent processes.