Hacker News new | ask | show | jobs
by AstralStorm 3243 days ago
This is better than RAID patrol reads only in that it also verifies file system structure periodically. And you do not necessarily have to bring down the filesystem to check even when data is in flight as long as it's driver supports online scan functionality. More than one FS does so. (XFS, btrfs and probably JFS. Not ext4 though.)

Not that online scanning makes to much sense anyway. The good filesystems verify sanity of the structure they traverse, so might as well put in a full FS read in cron. Most kinds of damage cannot be repaired on a live filesystem anyway. Even in ZFS.

2 comments

> might as well put in a full FS read in a cron

ZFS scrub is not the same thing.

If you do a full-filesystem read in a RAID system at the OS level, the redundant blocks won't be read: the RAID system will simply choose one of the copies to read based on which disk(s) is least heavily loaded at the moment. This is why reading on a 2-disk mirror is twice as fast as reading from a single one of the disks comprising the mirror.

During a ZFS scrub, all copies of every block are checked, and because the data is heavily checksummed, ZFS knows which copy is right if one of the 2+ redundant copies doesn't match its checksum.

Additionally, ZFS is structured as a Merkle tree (https://en.wikipedia.org/wiki/Merkle_tree) which avoids whole classes of ways traditional filesystems can become deranged at a structural level. ZFS always stores 3+ copies of certain types of filesystem metadata, even on a 1-disk ZFS pool, so that if one gets corrupted, it has 2+ others to choose from. When this same type of corruption happens on a traditional filesystem, well, let's just say that's why `/lost+found` exists.

> Most kinds of damage cannot be repaired on a live filesystem anyway.

See my post above, giving two anecdotes of ZFS actively repairing data on live filesystems. Both systems were in continuous use while these repairs proceeded, and no data were lost in either.

> Most kinds of damage cannot be repaired on a live filesystem anyway. Even in ZFS.

You're totally wrong.

The easiest way to demonstrate why is for you to set up a script to randomly write zeros/junk in any amount, at any time, anywhere over one of the block devices being used by ZFS, all day every day.

[Assuming you're using one of the available forms of redundancy i.e. multiple copies, ZRAID1/2, or mirroring etc.]

Sit back and watch ZFS giving no fucks at all as it repairs all the damage passively.

You can even introduce such damage in moderate quantities across all of the block devices used by ZFS. Again, you'll see a goddamn incredible amount of self-healing going on and accurate reporting about where it's unable to recover files due to the damage across multiple volumes being too extensive.

It's unlikely that even in this extreme instance of willful massive harm to the disks you'll see the filesystem being damaged because a) filesystem metadata is checksummed too b) the metadata blocks are automatically stored twice in different places c) you also have the redundancy of multiple devices e.g. mirroring/zraid.

Try it, prove me wrong.