| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by notacoward 3243 days ago

Totally off base, on several points. Any kind of checksum on the disk only protects what gets to the disk. Filesystem-level CRCs can protect the entire data path. If you have a defect in your RAID card or HBA, or anywhere in the software stack below the filesystem, on-disk CRCs will happily "validate" the already-corrupted data while filesystem-level CRCs are likely to detect the corruption. The author dismisses it as a "remotely likely scenario" but I've seen it happen for real many times. Maybe that's because I have about 3.5x as many years of experience as the author, across what's probably thousands of times as many machines or drives (I've worked on some big system).

The same "I've never seen it so it's not real" fallacy appears again in the discussion of RAID 5. He says that losing a second drive during a rebuild is "statistically very unlikely" but that's not so. Not only have I seen it many times, but the simple math of disk capacities and interface speeds shows that it's not really all that unlikely. I've seen RAID 6 fail because of overlapping rebuild times, leading people to push for more powerful erasure-coding schemes. Over the lifetime of even a medium-sized system, concurrent failures on RAID 5 are likely enough to justify using something stronger.

I was one of the earliest and most outspoken critics of ZFS hype and FUD when it came out. It was and is no panacea, but that doesn't justify more FUD in the other direction to sell backup products or services.