| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ralferoo 4 days ago

Hmmm, it's been a long long time since I actually had a failed drive (and also I don't use zfs), but from what I remember of my last failing drive 20 years ago, the drive was able to detect that sectors had been corrupted, and then failed the read rather than just returning silently corrupted data. If my memory is correct, replacing random bytes on disk wouldn't actually reflect the typical way data corruption manifests itself.

I always thought that the reason zfs did its extensive CRC checks was primarily to detect data corruption while it was in RAM or over the network, with a side effect that in the rare cares that data on disk got corrupted without the drive detecting it because the CRC was still valid, it'd also be spotted.

But anyway, it might be worth testing by replacing some of the disk images with actually truncated ones so that there are holes when reading, so that it returns an actual read error rather than junk data.

4 comments

adrian_b 4 days ago

The error-correcting codes used by HDDs/SSDs correct or detect the most frequent errors, but sometimes, when there are too many erroneous bits in a sector, they can mis-correct the data and then the HDD/SSD returns a corrupted sector without signaling any error.

I have seen this a few times on HDDs that had been used for the cold storage of archival data, for several years (around 5 years or even more). For each archive file, I had my own hash values that were used to detect corrupted files, which allowed me to detect all such cases. I had duplicates for all such HDDs. Sometimes both HDD copies had a few silent corrupted sectors, but they were not in the same locations, so in all cases I could recover the corrupted files from their duplicates. If I had stored the archival data without redundancy, I would have lost it.

If you do not use hashes or other error-detecting codes for all your files, like I do, you may have had some failures in your HDDs without recognizing them, but such errors are much more likely to happen in files that have been stored for many years.

link

ramses0 4 days ago

And/Or: `*.par` files.

https://en.wikipedia.org/wiki/Parchive

link

adrian_b 2 days ago

Yes, already for many years, I have also used par2create/par2verify for adding redundancy to archive files and repairing any corrupted files.

However, I use both par2create and duplicate storage media, because duplicates that are preferably stored in different geographic locations are the only solution that guards against incidents so serious that they would destroy partially or totally the storage device.

By itself, when an adequate amount of added redundancy is chosen, par2create is sufficient to recover archive files that are only affected by a few sporadic corrupted sectors, like on a HDD that has been stored in good conditions for some years. It will not help if the entire HDD becomes unusable, due to some mechanical or electrical defect, which may happen in HDDs used for cold storage, instead of being used continuously.

link

wongarsu 2 days ago

Or rar files with recovery records. Same concept, but in one self-contained file instead of a number of sidecar files

link

throw0101c 2 days ago

> I always thought that the reason zfs did its extensive CRC checks was primarily to detect data corruption while it was in RAM or over the network, with a side effect that in the rare cares that data on disk got corrupted without the drive detecting it because the CRC was still valid, it'd also be spotted.

Nope, it's always been about on-disk bit rot.

First off: drive firmware has been known to return the wrong LBA data. The OS asks for 123, the drive reads 234—and verifies its drive-level CRC, which passes—and sends it up. Application gets a bundle of bits that's not correct. With ZFS, it expects a certain checksum from that part of the tree/file, and so the LBA 234 gets returned it will not match the checksum that is for 123.

Next, if you have RAID-1, then if the drive has corrupted data, if you don't have higher-level FS checksums, how do you which mirror has the correct data? They're different, but which is correct. With ZFS you know which block has the correct checksum, return that data to application, and then use the correct data to correct the wrong one.

link

BuildTheRobots 2 days ago

I don't know how much better modern drives (and SSDs) have gotten[1], but as someone who started digital hoarding in the mid 90's, on-disk bitrot used to be a massive problem. The amount of my video, audio and pictures that suffered damage was palpable. ZFS offering to fix it was massive selling point and the time and based on personal experience, it delivered.

ZFS also lets you specify number of copies on a single disk. This sounds a bit weird, but as drives suffer block failures far more often than total failures, it's actually surprisingly useful in some situations.

[1] My suspicion is significantly, as storage sizes are now multiple orders of magnitude larger and errors per MB can't have scaled up linearly to match.

link

matja 4 days ago

You're right that the ECC validation is very robust, but that only validates one small part - that the drive is reading what it has previously written, not that the data was correct when it came in to the drive, correctly handled by the firmware, or even written in the correct place (LBA) on the drive.

There's been times when some features of entire models of drives have been disabled in the Linux kernel because of buggy firmware that silently writes bad data (with correct ECC), so reading it back is successful from both the drive's and the OS's block driver views.

I was hit by this myself with the queued TRIM command firmware bug that affected all Samsung EVO 840 SSDs (Linux kernel commit 9a9324d3969678d44b330e1230ad2c8ae67acf81 if you want to look into the history) - the drive didn't report any errors, but ZFS kept reporting corruption, and kept on fixing it in the background.

link

ssl-3 2 days ago

> Hmmm, it's been a long long time since I actually had a failed drive (and also I don't use zfs), but from what I remember of my last failing drive 20 years ago, the drive was able to detect that sectors had been corrupted, and then failed the read rather than just returning silently corrupted data.

That's the behavior that is desired, yes. And in a neat world of frictionless pulleys and ropes that don't stretch, perhaps that is what happens.

In reality, the root reasoning for filesystems to detect bitrot is simpler: It's irrational to expect that a device which is already failing is going to behave in a predictable way.

link