|
|
|
|
|
by hinkley
949 days ago
|
|
People genuinely don’t understand how statistics work. Which is part of why Vegas still makes so much money. When you put together a bunch of equipment with a small error rate, the time between errors climbs very fast. Build a RAID array with 33 disks and you’d better have a vendor picked out for replacements because you’ll be doing replacements fairly often, instead of every four to ten years with a single disk. And they don’t understand dependent statistics either. Every failing rocket engine can potentially damage its neighbors. Every failing hard drive requires a stressful operation on the remaining disks (resilvering) that may push the next drive to failure. |
|
Yes, naturally 33 > 1, so you might expect 33 times as many failures of individual components.
But your analogy between arrays of rocket engines and disks is apt, because both have redundancy to survive the failure of individual components.
For example, in the high-altitude flight test of the Starship prototype in May 2021, three of the 33 Raptor engines powering the first stage failed shortly after liftoff. The vehicle still managed to continue flying, reaching an altitude of 40 kilometers before failing due to a variety of causes.
> People genuinely don’t understand how statistics work
Indeed.
If an individual disk has MTBF of 2 million hours, the probability of it failing in the first year is 0.437%.
But put 33 of those disks in a RAID 6 array, which can tolerate 1 or 2 failures without replacement, and the probability of the entire array failing in the first year drops by a factor of ten to 0.0413%.
The statistics say the array is even more reliable than a single component by itself.