|
|
|
|
|
by pedalpete
3787 days ago
|
|
Does Github run anything like Netflix Simbian Army against it's services? As a company by engineers for engineers with the scale that github has reached, I'm a bit surprised they are lacking a bit more redundancy. Though they may not need the uptime of netflix, an outage of more than a few minutes on github could affect businesses that rely on the service. |
|
Complex systems fail. Period. All the time. Things like the Simian Army are fantastic tools that help you identify a host of problems and remediate them in advance, but they cannot test every combinatorial possibility in a complex distributed system.
At the end of the day, the best defense is to have skilled people who are practiced at responding to problems. GitHub has those in spades, which is why they could respond to a widespread failure of their physical layer in just over 2 hours.
The biggest win with the Simian Army isn't that it improves your redundancy. It's that it gives your people opportunities to _practice_ responses.