Hacker News new | ask | show | jobs
by Wowfunhappy 1409 days ago
Unless the filesystem is behaving in a way that is overwhelmingly stupid, the basic logic should still apply. I don't understand how error checking could ever cause data corruption. It might let you know about data corruption which would otherwise have gone unnoticed, but that's not the same thing.

If there is a filesystem that is dumb enough to cause corruption during the checksumming process, please let me know which one, so I can be sure to never ever ever go anywhere near it. :)

1 comments

A lot of things in computing are overwhelmingly stupid or assume everything will work as expected. I have experienced several data corruption events related to parity data being read incorrectly, not in ZFS, but with hardware and software raid controllers. In one case the hardware raid controller even had ECC memory, but its memory was overheating and thus introducing bad data into calculations when multi bit errors were not correctable. A similarly horrific error condition saw a controller confuse disk IDs in memory and start mirroring one drive to every other drive in the system.
Those are not instances of error checking causing data corruption. As I said, "I don't understand how error checking could ever cause data corruption."

Error checking will only ever help you, not hurt you. It doesn’t matter how bad you memory or disk or raid controller is. Error checking won't necessarily save you from those things, but it can in some cases, and it’ll never make things worse.

But they are though, the parity data calcs being corrupted in that first example caused data corruption during a scheduled array check while the system was under unusually heavy load. Error checking is good, and when things are working right it can only help. That is true, but it can't always be counted on if the hardware, software, etc is untrustworthy for whatever reason.
Okay, well I am totally and utterly confused as to how that could ever be possible, regardless of the hardware. You're confident that if not for the data validation the problem wouldn't have occurred?