Hacker News new | ask | show | jobs
by morning_gelato 1736 days ago
I would argue btrfs does not deliver "classic" RAID1. Last I checked if you lose 1 disk in a 2 disk btrfs RAID1 setup and then reboot, it will be unable to mount the filesystem until you change the mount options so that it is set to 'degraded' mode. This is very different from other RAID1 setups (e.g. hardware raid controllers, mdadm, openzfs), and a big problem if you don't have lights out management or fast physical access to the machine.
1 comments

I would argue that the 'degraded' stuff is a valid but different critique - and in fact is covered in a completely separate part of the article at some length.
I think we are in agreement. I was responding to the comment that stated "BTRFS delivers „classic“ RAID1 and more/better.", which is what I am disagreeing with. Requiring that mount options be changed whenever there is a drive failure (despite having sufficient redundancy) is definitely an anti-feature in my book.
There’s no need for changing mount options. If you want to allow mounting of degraded arrays, just put the degraded option there from the start.
That sounds to me like it was originally set up that way early in development because they wanted people to give immediate manual attention to a system before booting it in that state.

If btrfs is mature enough that it's "safe" to boot missing a disk now I think either the defaults or the documentation probably want changing to make that clearer.

Like, I get "oh just add this option" as a response but in this case the fact distros don't default to adding it and the docs don't say "sure, do that" somewhere prominent mean I'm allowed to be a bit worried about how safe it actually is.

Whether to include the degraded option by default is a policy choice that's far beyond the purview of filesystem developers, and not something that most distros can give a clear answer to, either. It boils down to a question of the end user's use cases and risk tolerance. But it seems pretty reasonable to state that a loss of redundancy should either be handled by the user, or by a piece of software sitting between the user and the filesystem itself and acting in accordance with the user's preferences. Silently continuing to operate but with less safety than the user originally requested is the kind of dangerous that should be an opt-in feature, not a default.

Moving the decision into the filesystem itself only makes sense if the filesystem is equipped to enact mitigating actions such as claiming a hot spare as the replacement device, notifying the user/sysadmin through whatever logging/reporting mechanism is actually monitored by a human, signalling applications like load balancers to stop relying on this particular machine if a healthy alternative is available, etc.

(There's also an implementation detail that can trip up users who are trying to live dangerously: you're not supposed to mount a degraded btrfs array as writable until you're prepared to fix the problem making it degraded—such as by providing the devices needed to restore redundancy, or converting it to not be a redundant array anymore.)