|
|
|
|
|
by AstralStorm
3243 days ago
|
|
This is better than RAID patrol reads only in that it also verifies file system structure periodically.
And you do not necessarily have to bring down the filesystem to check even when data is in flight as long as it's driver supports online scan functionality. More than one FS does so. (XFS, btrfs and probably JFS. Not ext4 though.) Not that online scanning makes to much sense anyway. The good filesystems verify sanity of the structure they traverse, so might as well put in a full FS read in cron. Most kinds of damage cannot be repaired on a live filesystem anyway. Even in ZFS. |
|
ZFS scrub is not the same thing.
If you do a full-filesystem read in a RAID system at the OS level, the redundant blocks won't be read: the RAID system will simply choose one of the copies to read based on which disk(s) is least heavily loaded at the moment. This is why reading on a 2-disk mirror is twice as fast as reading from a single one of the disks comprising the mirror.
During a ZFS scrub, all copies of every block are checked, and because the data is heavily checksummed, ZFS knows which copy is right if one of the 2+ redundant copies doesn't match its checksum.
Additionally, ZFS is structured as a Merkle tree (https://en.wikipedia.org/wiki/Merkle_tree) which avoids whole classes of ways traditional filesystems can become deranged at a structural level. ZFS always stores 3+ copies of certain types of filesystem metadata, even on a 1-disk ZFS pool, so that if one gets corrupted, it has 2+ others to choose from. When this same type of corruption happens on a traditional filesystem, well, let's just say that's why `/lost+found` exists.
> Most kinds of damage cannot be repaired on a live filesystem anyway.
See my post above, giving two anecdotes of ZFS actively repairing data on live filesystems. Both systems were in continuous use while these repairs proceeded, and no data were lost in either.