Hacker News new | ask | show | jobs
by _kdave 1281 days ago
> It won't boot on a degraded array by default, requiring manual action to mount it

If you want it to behave like that then add 'degraded' to fstab. That a device is missing can have unknown reasons, the user should know better and resolve it or allow such boot. It's not automatic as there's no way to inform the user that it's degraded state.

3 comments

I don't quite understand the use case here. If I'm setting up RAID it's because I want the system to stay up. That's the only purpose for it.

If a device goes missing for "unknown reasons", then the machine should still work, and I'll figure out what happened when monitoring pokes me and says RAID is degraded.

The use case is: Enough drives failed that your raid is degraded. Any more data you write is not replicated and it may be due to software/hardware issue that will kill more drives soon.

It's up to you to choose at that point - is availability more important for you (add degraded to fstab), or data consistency (deal with the array first).

> That's the only purpose for it.

That's not the only purpose for it. There's three reasons I can think of that you might set up a RAID array:

    * You want better uptime. (your use case)

    * You want to protect from data loss. (my assumption was that this is the most common use case, but I could be wrong. This also helps with uptime because there's nothing worse for uptime than having to restore lost data from a cold backup)

    * You want better performance, data integrity be damned. (RAID 0)
Booting a RAID array with a failed disk is a bad idea if you care a lot about not losing data, because now you're one less disk failure away.
> Booting a RAID array with a failed disk is a bad idea

Booting a RAID array with a failed disk is absolutely fine idea.

How else I get access to the tools to identify the bad drive and resilver RAID on a replacement, be it in the same bay or not?

Booting from a degraded array is only a fine idea in some circumstances, not all. That's why the kernel should not default to automatically doing so; but a distro or sysadmin that has better knowledge of the broader situation (eg. presence of hot spares or a working monitoring/alert system) can reasonably change that default when the risks of booting from a degraded array have been mitigated.
Ie you are treating RAID as a backup.
Backups cannot be perfectly real-time unless they are very nearly RAID. Any time you are generating/collecting important data, you will unavoidably have some amount of that important data in the state of not yet backed up.

It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

Since the kernel has no way of knowing which volumes may contain important data that didn't get the chance to be backed up, it should try its best to maintain the original durability standards the filesystem was configured until some mechanism outside the kernel authorizes the relaxation of those standards.

No, you are treating RAID as a protection against longer outage of restoring from backups.
Another way of thinking about it: should uptime with bad data or services making false guarantees about data durability actually count as uptime?
RAID 0 should be called AID, since it’s not really RAID.
The '0' says it exactly: the amount of data you are left with, once one of the drives fails.
Yeah but monitoring is not something that comes with the filesystem. If you have to set up the system to be a HA and configure monitoring, email notifications whatever, making sure the filesystem is created with redundant profiles, then I'm expecting that also adding the 'degraded' to the fstab is part of the configuration.
The system stays up just fine. You just can't reboot it without fixing/ignoring the problem. I think that's fair.
Most distros have a udev rule in place that inhibits a mount attempt of a multiple device Btrfs until all devices are visible to the kernel. The degraded mount option won't even matter in this case, because mount isn't attempted.

If you remove this udev rule and then add degraded mount option to fstab, it's very risky because now even a small delay in drives appearing can result in a degraded mount. And it's even possible to get a split brain situation.

Btrfs needs automatic abbreviated scrub, akin to the mdadm write intent bitmap which significantly reduces the resync operation.

according to https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu..., btrfs devs say you should not add degraded to fstab, and doing so can easily result in data loss.
The quote is "At this stage of development, a disk failure should cause mount failure so you're alerted to the problem." and it's from over 9 years ago on an ancient kernel. That was just 3+ years since btrfs project got started.

Let's treat it as archived historical content it is.