|
|
|
|
|
by LinuxBender
3799 days ago
|
|
I am not at all surprised. There are 'best practices' and then there is what really happens based on business processes and needs. In reality, even the most cloudy of cloud providers will run into this problem at some point. Folks often come up with ideas of implementing something like Chaos Monkey in their data-center, then realize the actual impact it will have and find it is almost impossible to get the rest of the business to agree to this concept. It isn't as easy at it sounds. I only know of two businesses that have actually implemented Chaos Monkey; one being the company that coined the term. Even regular reboots won't catch these problems and if folks were honest, you would see +1 year up-times on most servers in most places. That is just based on my experiences and I am sure some of you have seen different. |
|
And the worst that can happen is a customer's stream stops and they have to restart it.
But in most big companies you have thousands of apps that are all doing very different things. Perhaps a critical app might run on 4 hosts spread across two data centres - you're not going to convince people to have chaos monkey regularly and randomly bringing down these hosts, it would cause real impact and is risky. Yeh in theory it should be able to cope but in reality the scales in most orgs are quite different.
That said github sounds a lot more like the netflix end of the scale, doing one specific thing at large scale.