Hacker News new | ask | show | jobs
by p-e-w 903 days ago
I don't understand this. If two or more computers fail in the same way simultaneously, isn't it much more likely that there is a systemic design problem/bug rather than some random error? But if there is a design problem, how does having more systems voting help?
5 comments

It is possible for a random error to affect two computers simultaneously, if they are made from the same assembly line, they may fail in exactly the same way, especially if they share the same wires.

That's the reason I sometime see that for RAID systems, it is recommended to avoid buying all same disks at the same time, because since they will be used in the same way in the same environment, there is a good chance for them to fail at the same time, defeating the point of a redundant system.

Also, to guard against bugs and design problems, critical software is sometimes developed twice or maybe more by separate teams using different methods. So you may have several combinations of software and hardware. You may also have redundant boards in the same box, and also redundant boxes

They are not going to fail the same way simultaneously. This is protecting against cosmic ray induced signal errors within the logic elements, not logic errors due to bad software.
The multi processor voting approach seeks to solve issues introduced by bit flips caused by radiation, not programming issues.
Having at least 3 computers allows you the option to disable a malfunctioning computer while still giving you redundancy for random bit flips or other environmental issues.
Which is why different sets of computers will run software developed by independent groups on different principles, so that they very unlikely to fail simultaneously.