Hacker News new | ask | show | jobs
by AshamedCaptain 1527 days ago
I believe people use "cosmic rays" as catch-all phrase for all these very low probability error causes (just because of the coolness of cosmic rays), but in practice _any_ other cause is much more common than cosmic rays.

Even at the processor level every single transistor on it has a rated mean time between failures a.k.a. MTBF. Sure it may be astronomical, but you do have a lot of transistors, so in practice a random bitflip is not such a rare event. Designers actually explore MTBF vs power usage trade-offs here, and there is even a fascinating area of "fault resilient computing" research.

Every single clock domain crossing has another MTBF (google metastability). Again they are very high (billions of years if done properly), but you will have plenty of such crossings (and the number keeps growing with modern, more asynchronous design).

Processors are quite unreliable things.

1 comments

Ironically, even though the more modern, "asynchronous" (really, more just asynchronous communication between fully-synchronous clock domains) CPU designs result in more chances for metastability, a fully asynchronous, self-timed design wouldn't have to have any likelihood of metastability at all!