Hacker News new | ask | show | jobs
by londons_explore 826 days ago
> last week my btrfs filesystem got irrecoverably corrupted.

This is 2 bugs really. 1, the file system got corrupted. 2, tooling didn't exist to automatically scan through the disk data structures and recover as much of your drive as possible from whatever fragments of metadata and data were left.

For 2, it should happen by default. Most users don't want a 'disk is corrupt, refusing to mount' error. Most users want any errors to auto-correct if possible and get on with their day. Keep a recovery logfile with all the info needed to reverse any repairs for that small percentage of users who want to use a hex editor to dive into data corruption by hand.

3 comments

Yeah the last time I had a btrfs volume die, there were a few troubleshooting/recovery steps on the wiki which I dutifully followed. Complete failure, no data recoverable. The last step was "I dunno, go ask someone on IRC." Great.

It's understandable that corruption can happen due to bugs or hardware failure or user insanity, but my experience was that the recovery tools are useless, and that's a big problem.

Writing to a corrupted filesystem by default is bad design. The corruption could be caused by a hardware problem that is exacerbated by further writes, leading to additional data loss.
Where is that log file supposed to be stored? It can't be on the same filesystem it was created for or it negates the purpose of its creation.
If I were designing it, the recovery process would:

* scan through the whole disk and, for every sector, decide if it is "definitely free space (part of the free space table, not referenced by any metadata)", "definitely metadata/file data", "unknown/unsure (ie. perhaps referenced by some dangling metadata/an old version of some tree nodes)".

* I would then make a new file containing a complete image of the whole filesystem pre-repair, but leaving out the 'definitely free space' parts.

* such a file takes nearly zero space, considering btrfs's copy-on-write and sparse-file abilities.

* I would then repair the filesystem to make everything consistent. The pre-repair file would still be available for any tooling wanting to see what the filesystem looked like before it was repaired. You could even loopmount it or try other repair options on it.

* I would probably encourage distros to auto-delete this recovery file if disk space is low/after some time, since otherwise the recovery image will end up pinning user data to using up disk space for years and users will be unhappy.

The above fails in only one case: Free space on the drive is very low. In that case, I would probably just do the repairs in-RAM and mount the filesystem readonly, and have a link to a wiki page on possible manual repair routes.

>The above fails in only one case: Free space on the drive is very low.

No. Most of the block will be marked as unsure in first step -- because most of them had been used before thanks to CoW

A heuristic could be written like 'protect the latest version of each node, plus 2 prior versions, but anything older you find, treat it as free apace'.