|
|
|
|
|
by JetSetWilly
3793 days ago
|
|
The problem is most environments are very heteregenous. I evaluated chaos monkey approach for a big bank, the issue is that netflix has whole data centres full of loads of machines doing pretty much the same thing, streaming and serving. And the worst that can happen is a customer's stream stops and they have to restart it. But in most big companies you have thousands of apps that are all doing very different things. Perhaps a critical app might run on 4 hosts spread across two data centres - you're not going to convince people to have chaos monkey regularly and randomly bringing down these hosts, it would cause real impact and is risky. Yeh in theory it should be able to cope but in reality the scales in most orgs are quite different. That said github sounds a lot more like the netflix end of the scale, doing one specific thing at large scale. |
|
Chaos Monkey fits when people build and deploy their services with the notion that any particular instance (or dependency) could fail at any given time. It's a tough road to evolve out of a legacy, monolithic stack without much redundancy baked in.