|
|
|
|
|
by beagle3
3352 days ago
|
|
Damaged files (that can still be read from disk without error) are more likely than not a result of bad memory. It is incredibly unlikely to have bits mutated on disk without triggering a CRC error -- approx 1 in 4 billion errors would go undetected in a CRC-32; and it has to be spread over more than 32-bits-wide. The fact that you had multiple damaged files basically guarantees that it was a faulty memory (or bus, or controller -- but not actual magnetic media corruption) issue. ZFS can't really help you with that as the data was likely damaged in transit rather than on disk; though with its own 256 bit hash, it is likely to detect those faulty system components earlier than later. |
|
http://i.imgur.com/uz2inSy.jpg
I encountered this going through a copy of my photos stored on a pair of WD Greens using NTFS 3-4 years ago. The original copy on a ZFS machine was fine. I found a few others, and promptly stopped using those drives.
Two years ago I had repeated bursts of ZFS checksum errors from a pair of SanDisk SSDs. Evidently TRIM didn't quite work perfectly 100% of the time, and caused data corruption - luckily ZFS was always able to repair it, and it being detected meant I could do something about it early - I updated firmware and the issue went away. Last year it came back after an OS update, and I just turned TRIM off completely (I guess it was sensitive to TRIM patterns and those changed).
Last year I also had a Toshiba HDD forget how to IO properly, and got a constant stream of ZFS checksum errors from it until I yanked it from the hot-swap bay and reinserted it. It resilvered and scrubbed fine.
These aren't the only times I've seen checksum errors and silent corruption, they're just the most recent. ZFS lost a file once, and was very noisy about it - the status message for the lost metadata stayed until I recreated the pool. NTFS, UFS2, ext2, all were completely silent on the fact that they were showing me data that was clearly wrong.
I don't trust disks, or IO controllers, and I don't trust filesystems that do. Neither should you.