Sounds like someone else’s fault, unless he owned and designed the system. No single engineer should be able to cause that kind of damage. Quorum rules, etc.
It was a team of five senior level linux sysadmins who oversaw about 2k servers. Lots of arguments for and against pushing blame here or there, but at the end of the day - he fucked up big time and shoulda known better.
I’m sure that’s true, but my point is that if you have that much money riding on a system you should have to figuratively (if not literally!) put two keys in and turn the lock at the same time to break shit. There should be systematically enforced mandatory reviews, two plus person policy for issuing commands, etc.
You have to expect people to make mistakes. I’m not saying he didn’t fuck up, but if a company is down a billion dollars the story should be of multiple people making multiple mistakes.
Hm. I mean yes, but still - the guy made a legit rookie mistake by not checking the hostname of the host he rebooted before typing "reboot". Kinda 101 stuffs there. :/