| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by p1necone 1281 days ago

> That's the only purpose for it.

That's not the only purpose for it. There's three reasons I can think of that you might set up a RAID array:

    * You want better uptime. (your use case)

    * You want to protect from data loss. (my assumption was that this is the most common use case, but I could be wrong. This also helps with uptime because there's nothing worse for uptime than having to restore lost data from a cold backup)

    * You want better performance, data integrity be damned. (RAID 0)

Booting a RAID array with a failed disk is a bad idea if you care a lot about not losing data, because now you're one less disk failure away.

3 comments

justsomehnguy 1281 days ago

> Booting a RAID array with a failed disk is a bad idea

Booting a RAID array with a failed disk is absolutely fine idea.

How else I get access to the tools to identify the bad drive and resilver RAID on a replacement, be it in the same bay or not?

link

wtallis 1281 days ago

Booting from a degraded array is only a fine idea in some circumstances, not all. That's why the kernel should not default to automatically doing so; but a distro or sysadmin that has better knowledge of the broader situation (eg. presence of hot spares or a working monitoring/alert system) can reasonably change that default when the risks of booting from a degraded array have been mitigated.

link

justsomehnguy 1281 days ago

Ie you are treating RAID as a backup.

link

wtallis 1281 days ago

Backups cannot be perfectly real-time unless they are very nearly RAID. Any time you are generating/collecting important data, you will unavoidably have some amount of that important data in the state of not yet backed up.

It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

Since the kernel has no way of knowing which volumes may contain important data that didn't get the chance to be backed up, it should try its best to maintain the original durability standards the filesystem was configured until some mechanism outside the kernel authorizes the relaxation of those standards.

link

justsomehnguy 1281 days ago

> It's reasonable to want to preserve all the data you currently have—some of which probably hasn't been backed up yet—and not accept new data to be written with the durability guarantees the array was originally configured for silently violated.

IE (by your logic) the system should stop the writes as soon as the array became degraded.

But this is not what happens with btrfs: it would happily continue to write the data on the array until reboot.

And then suddenly it's "oh my god array is degraded!!!111 you should not write to it1111".

To add on that: I never seen for a HW RAID card to stop booting by a mere degraded state of the array. Changes in configuration of arrays, loss of more than enough for redundancy drives - yes, that would halt the boot and require the operator intervention. Array in a degraded state? Just spit the warnings to the console and boot. Nobody has the time to walk to each server with a degraded array on every reboot.

link

happymellon 1281 days ago

No, you are treating RAID as a protection against longer outage of restoring from backups.

link

wtallis 1281 days ago

Another way of thinking about it: should uptime with bad data or services making false guarantees about data durability actually count as uptime?

link

jonhohle 1281 days ago

RAID 0 should be called AID, since it’s not really RAID.

link

vetinari 1281 days ago

The '0' says it exactly: the amount of data you are left with, once one of the drives fails.

link