Hacker News new | ask | show | jobs
by wtallis 4463 days ago
> because ZFS will try to fix the file but can never do it successfully since the memory is bad (and the original checksum was wrong)

It sounds like this applies to a stuck bit of memory, not just a once-off corruption. If so, then you're still not making a fair comparison - if your memory is broken, then any filesystem will continue to write bad data when flushing cache residing on that broken chip, and if the broken piece of memory is used to store more critical data structures then the potential damage is unbounded. But this is orders of magnitude less common than the transient memory errors that are the actual reason for having ECC.

So once again, assuming the only reasonable scenario of a bit being randomly corrupted in a fully-functional memory module, how much more fragile is ZFS than other modern file systems? Do other filesystems re-read and re-parse their on-disk data structures more often where ZFS caches them in memory? If ZFS writes a corrupted block to disk with a non-matching checksum (or a correct block with corrupted checksum), is that corruption contagious and capable of spreading to other blocks or other files as a result of trying to read the corrupted block back from the disk?