|
|
|
|
|
by travmatt
2006 days ago
|
|
A good example was the AWS S3 outage that occurred when a single engineer mistyped a command[0]. While the outage wouldn't have occurred had an engineer not mistyped a command, that conclusion still would have missed the issue that the system should have some level of resiliency against simple typos - in their case, checking that actions that wouldn't take subsystems below their minimum required capacity. [0] https://aws.amazon.com/message/41926/ |
|
For example, let’s say you have a service that uses another service that raised its cost from free to $100/hour and you call it 1000 times per hour.
Even though you may not have a fallback, and your service may fail, you need to be able to disable it. In this case, an admin is unavailable and the only recourse would be to lower the capacity to 0, since you have that control.
That doesn’t negate the benefit of validation, but don’t be too heavy-handed with validation, just as a reaction to failure without fully thinking it through.