Hacker News new | ask | show | jobs
by dangrossman 4615 days ago
> wildly improbable catastrophe

Or an extremely probable one like a hard disk failure. They only last a few years; most data centers see an annual replacement rate in the 2-13% range. The failure rate is a known quantity, and their limited 1-3 year warranties that reflect that expectation.

There isn't a host I've used more than a few years where I haven't seen hard drives (and power supplies) fail. I don't know if my experience is typical, but hardware RAID controllers seem to go bad on me not-infrequently too, losing the whole array at once. They don't pay you when it happens, they just replace it. DO was extremely generous here.

2 comments

Was going to say the same thing, Dual drive failure on a RAID5 system with five 2TB drives is 1 in 12. With 3TB drives that goes up to 1 in 7.

The underlying issue is that the uncorrectable read error rate is 1 in 10^15 bits, this is just physics (thermal noise, read/write signal loss, etc) But with 8b/10b encoding that is only 90TB worth of bits. Rebuilding a RAID group of 5 with four 2TB "good" drives (8TB of data to be read) you will see a failure in one of the other 4 drives 1 in 11.25 times. (90/8). With 3TB drives 1 in 7.25 times. Using simple mirroring you won't be able to re-silver a mirror 1 in 1:45 or slightly more than 2% of the time for 2TB drives.

Dual parity, or triple mirrors (x3) are now the minimum bars for making storage reliable.

Well it's just a bit unlucky to have both drives fail in a RAID (although not impossible).