Hacker News new | ask | show | jobs
by JetSetWilly 3793 days ago
That's true, but if an app, say, is running on 4 hosts doing some boutique thing for a small unit of 20 traders, then the practical reality is that they might not want Chaos Monkey bringing down 25% of the throughput randomly, and interrupting whatever actual cash money requests are in progress on a host.

Itsa lot easier to promote that if it is thousands of servers doing something fairly mundane where, worst-case, it not working means a tiny tiny proportion of your customers have to restart their video stream. So what?

But for a small hetereogenous business where what's happening has a much higher cash density, the actual practicalities of randomly killing things in production and the risk that represents rather get in the way, even though in theory you should be able to kill anything in production with minimal impact, you are much less inclined to take that risk when the stakes are higher.

1 comments

I think you're missing the point. The point of something like chaos monkey is to force you to build a system that won't lose money by "bringing down 25% of the throughput".
My point is that nomatter how well engineered your system is, to actually have chaos monkey running in production really depends on the risk profile and scale of your business.

As soon as chaos monkey cause a service interrupt for, say, traders - it would get turned off and whoever had such a bright idea fired. But if it causes a service interruption for a tiny proportion of people watching streaming videos - no big deal.

Its proponents just ignore this practical reality and seem politically unaware.