|
|
|
|
|
by technion
3695 days ago
|
|
The problem with failures during rebuilds is overblown
I thought I was the only one who believed that. I've said this on reddit before and ended on like -20 votes with people blatantly arguing I'm falsifying an "impossibility".I've got roughly 30 arrays in production, between 4 and 12 disks in each. All are RAID5 + hotspare. If you believe the maths people keep quoting, the odds of seeing a total failure in a given year is close to 100%. I started using this configuration, across varying hardware, over 15 years ago and I've been growing in number since. I'm not pretending one example proves the rule, or that it's totally safe and I would run a highly critical environment this way (before anyone comments: these environments do not meet that definition), but people have tried to show maths that there's a six nine likelihood of failure, and I just don't for a second believe I'm that lucky. |
|
The misconfiguration is the drive's SCT ERC timeout is greater than the kernel's SCSI command timer. So what happens on a URE is, the drive does "deep recovery" if it's a consumer drive, and keeps trying to recover that bad sector well beyond the default command timer of the kernel, which is 30 seconds. At 30 seconds the kernel assumes something's wrong and does a link reset. On SATA drives this obliterates the command queue and any other state in the drive. The drive doesn't report a read error, doesn't report what sector had the problem, and so RAID can't do its job and fix the problem by reconstructing the missing data from parity and writing the data back to that bad sector.
So it's inevitable these bad sectors pop up here and there, and then if there's a single drive failure, in effect you get one or more full stripes with two or more missing strips, and now those whole stripes are lost just as if it were a 2-disk failure. It is possible to recover from this but it's really tedious and as far as I know there are no user space tools to make such recovery easy.
I wouldn't be surprised if lots of NAS's using Linux were configured this way, and the user didn't use recommended drives because, FU vendor those drives are expensive, etc.