Hacker News new | ask | show | jobs
by chacham15 2531 days ago
Unexpected things are bound to happen. But, one thing that stuck out to me is that you dont seem to have a safe way to test changes (which would have prevented the second failure). Are there no other environments to test changes on? Is there no way to incrementally roll-out? Is there not another environment which can step in in place of a failing one while you investigate? These seem like fairly common industry practices which help you deal with unexpected failures, but I dont see a mention of if/why these practices failed and if/how that is being remediated.