Hacker News new | ask | show | jobs
by pheleven 2541 days ago
This actually isn't all that strange a failure mode. We have several large ZFS arrays in service and replace 1-2 failed disks every month. About 90% of the time the first warning you get is exactly this - a message from the CAM controller saying it failed a read in the syslog. ZFS nor SMART often notice these until they get pretty bad/frequent. By the time they're bad enough for other software to notice, your pool is performing pretty poorly.

We deal with this by watching for these errors, printing to a log specifically for Icinga to watch for and alert on, and preemptively replace the disks. It would be nice if the other software (ZFS, SMART) would notice these in time to not become severe.