Hacker News new | ask | show | jobs
by cmurf 1079 days ago
ZFS and Btrfs have demonstrated devices have a variety of transient failures including maintaining write order implied by fsync or fua.

This will thwart any filesystem.

SSDs do not reliably report UNC read errors when data can't be retrieved. Garbage or zeros are returned instead.

There's a reason why ext4 and XFS added journal and metadata checksumming. Storage devices just aren't as reliable at informing the kernel when it suspects the data returned is bad.

1 comments

Incorrect write order shouldn't thwart a CoW filesystem. It can check at mount time whether the last few commits are fully there.
If the write order isn't guaranteed you can get a new super block in place without the updated trees being written. The super points to trees that don't exist.

Recent but no longer current trees, can be partly overwritten when the kernel is informed a super block write was successful. But if the super block write wasn't successful (the device lied), the stale super block on disk points to damaged metadata and recoverability isn't certain.

You can tell if the metadata is correct by checking the hashes of everything committed by that superblock.

If it isn't correct, ignore it and move on to the previous superblock. Keep going until you can verify a contiguous 30 seconds of superblocks.

If writes are being delayed by more than 30 seconds, your problems go beyond "out of order".

This does impose the requirement not to overwrite trees that are only a minute or two old. That should not be hard.