|
|
|
|
|
by dannyperson
4401 days ago
|
|
What happens when a URE is encountered and all the disks are online? It seems that they could be detected early and fixed before a rebuild is necessary by doing a weekly sweep of the entire array, reading every data block. |
|
Checksumming filesystems let you find the faulty drive by reconstructing data from each n-choose-(n-1) drive set and finding the set with the correct hash.
Filesystems using FEC instead of raid (5/z1, 6/z2 ...) can also correct data errors, but I'm not aware of any consumer-level filesystems that implement it. I'm not sure why. Doesn't Amazon use it for S3? Data block and FEC data layout on a disk array has to be a solved problem.