Hacker News new | ask | show | jobs
by hueving 3510 days ago
But their system doesn't thrive on chaos monkey. It's just resilient to it.
2 comments

In this case, the anti-fragile system is the entire system, including Netflix engineers and the cloud over time. The cloud is stressed, maybe even goes down, but in response, becomes stronger and more reliable because engineers make changes.

Point-in-time human-engineered systems still can't really be anti-fragile, except perhaps in some weird corner cases, but the system as a whole with the humans included, over time, can be.

It should also be pointed out that "anti-fragile" was always intended to be a name for things that already exist, and to provide a word that we can use as a cognitive handle for thinking about these matters, not a "discovery" of a "new system" or something. There are many anti-fragile systems; in fact it's pretty hard to have a long-term successful production project of any kind in software without some anti-fragility in the system. (But I've seen some fragile projects klunk along for quite a while before someone finally came in and did whatever was managerially necessary to get someone to address root causes.)

Ah, true when you include the engineers in the loop I suppose. But then that becomes a vague term for any system where engineers fix problems after some load/failure testing.

When I think of anti fragile systems, truly adaptive algorithms come to mind that learn from a failure. For example, an algorithm that changes the leader in a global leader election system based on the time of day because one geographic region of the network is always busier depending on time of day and latency to the leader impacts performance.

Yes. It is stronger because it is attacked by it.