Hacker News new | ask | show | jobs
by wyldfire 3795 days ago
The only way you can build fault-resilient systems is to frequently test fault injection scenarios. Netflix is pretty mature in this regard, perhaps Github can learn from their example.

That said, it's possible that github may have considered that this particular style of outage is rare enough that they don't want to make their design tolerate it. Though if that were the case, I'd wager they'd re-evaluate the cost/benefit right around now. :)