Hacker News new | ask | show | jobs
by closeparen 172 days ago
Killing instances of load-balanced stateless services is not that interesting anymore in the context of a mature service mesh. What is interesting is injecting failures or latency on specific edges of the call graph to ensure that “fail open” dependencies really are. This is accomplished with context propagation, baggage, middleware, and L7 proxies rather than killing anything at the VM/container level. Even iptables rules turned out to not be a very good approach since most destinations would have many, constantly cycling IPs and ports.

In the stateful world, chaos testing is useful, but you really want to be treating every possible combination of failures at every possible application state, theoretically with something like TLA or experimentally with something like Antithesis. The scenarios that you can enumerate and configure manually are just scratching the surface.

1 comments

At Netflix when this article was written, Cloud Engineering accomplishing failure injection with circuit breakers which essentially were L7 proxies. Chaos engineering was more than killing instances. There was a whole simian army after all. They would inject latency, error codes, etc and simulate tiers of the application failing. It’s not nearly as unsophisticated as your making it seem.