Hacker News new | ask | show | jobs
by guslees 3006 days ago
That's not quite true. More parts == more things that can fail, but whether those failures result in the entire system failing depends on how you've combined the parts.

If you make each of the pieces required parts of the whole, then yes - adding more of them will increase the chance that the whole system fails. But in kubernetes, the additional pieces (nodes) are all redundant parts of the whole, and can fail without affecting the availability of the whole system. The more nodes you add, the more redundancy you're adding, and the less chance that the system as a whole will be affected.

Mathematically:

If a component fails F% of the time, then adding N of them "in series" (all of them need to work) means your whole system fails with a (1-(1-F)^N)% chance. Iow, as N goes up, the system approaches (1-0)% => 100% chance of failure.

Otoh, if you combine the parts "in parallel", and you only need any one[1] of the components to work in order for the whole system to work, then the system has a F^N% chance of failure. As N goes up, this system approaches 0% chance of failure.

[1] Kubernetes (etcd) isn't quite this redundant, since etcd needs a majority quorum to be functional not just any single node. But the principle is similar and still gets more reliable as you add nodes.