|
|
|
|
|
by lcnPylGDnU4H9OF
599 days ago
|
|
> Redundancy, scalability, decoupling, resilience, best possible handling of errors, cost optimization, etc. may be more important at the scale Netflix operates at. So much that they built a tool to intentionally make things difficult (read: it arbitrarily stops production system processes/containers/etc.) and help inform what decisions to make in favor of fault tolerance. > Exposing engineers to failures more frequently incentivizes them to build resilient services. https://github.com/Netflix/chaosmonkey https://en.wikipedia.org/wiki/Chaos_engineering |
|