Hacker News new | ask | show | jobs
by p1necone 1281 days ago
> That's the only purpose for it.

That's not the only purpose for it. There's three reasons I can think of that you might set up a RAID array:

    * You want better uptime. (your use case)

    * You want to protect from data loss. (my assumption was that this is the most common use case, but I could be wrong. This also helps with uptime because there's nothing worse for uptime than having to restore lost data from a cold backup)

    * You want better performance, data integrity be damned. (RAID 0)
Booting a RAID array with a failed disk is a bad idea if you care a lot about not losing data, because now you're one less disk failure away.
3 comments

> Booting a RAID array with a failed disk is a bad idea

Booting a RAID array with a failed disk is absolutely fine idea.

How else I get access to the tools to identify the bad drive and resilver RAID on a replacement, be it in the same bay or not?

Booting from a degraded array is only a fine idea in some circumstances, not all. That's why the kernel should not default to automatically doing so; but a distro or sysadmin that has better knowledge of the broader situation (eg. presence of hot spares or a working monitoring/alert system) can reasonably change that default when the risks of booting from a degraded array have been mitigated.
Ie you are treating RAID as a backup.
Backups cannot be perfectly real-time unless they are very nearly RAID. Any time you are generating/collecting important data, you will unavoidably have some amount of that important data in the state of not yet backed up.

It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

Since the kernel has no way of knowing which volumes may contain important data that didn't get the chance to be backed up, it should try its best to maintain the original durability standards the filesystem was configured until some mechanism outside the kernel authorizes the relaxation of those standards.

> It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

IE (by your logic) the system should stop the writes as soon as the array became degraded.

But this is not what happens with btrfs: it would happily continue to write the data on the array until reboot.

And then suddenly it's "oh my god array is degraded!!!111 you should not write to it1111".

To add on that: I never seen for a HW RAID card to stop booting by a mere degraded state of the array. Changes in configuration of arrays, loss of more than enough for redundancy drives - yes, that would halt the boot and require the operator intervention. Array in a degraded state? Just spit the warnings to the console and boot. Nobody has the time to walk to each server with a degraded array on every reboot.

No, you are treating RAID as a protection against longer outage of restoring from backups.
Another way of thinking about it: should uptime with bad data or services making false guarantees about data durability actually count as uptime?
RAID 0 should be called AID, since it’s not really RAID.
The '0' says it exactly: the amount of data you are left with, once one of the drives fails.