Hacker News new | ask | show | jobs
by baruch 4912 days ago
The disks themselves (new ones at least) have a background scan process (called BMS or BGMS) which helps considerably. The one thing the disk can't do by itself is correct unrecoverable errors since by the definition the disk can't recover from them :-)

The combination of BMS and disk scrubbing at the RAID level should handle almost all of the issues that are pointed by the original post.

Though RAID scrubs can and do take a long time to complete, depending on the performance impact that you are willing to suffer on a continuous basis it can take a week or two to perform proper scrubbing.

Proper scrubbing would include not just reading the RAID chunk on a disk but to also read the other associated chunks from the other disks and verify that the parity is still intact. In RAID5 you will not be able to recover if the parity is bad as you won't know what chunk has gone bad.

I've been coding such systems for a while now and as a shameless plug would point to http://disksurvey.com/blog/ if there are things of interest I'd be happy to take requests and write about them as well.