Hacker News new | ask | show | jobs
by throwaway2048 2793 days ago
as opposed to raid 5 where if any two disks fail your array is toast, raid 6 increases this to 3.

However both raid 5 and 6 have 2 huge problems:

Data inflight at write time (power/hardware failures are more likely to corrupt the array, especially silently, which is the worst outcome).

Parity calculations require you to spin up the whole raid5/6 array during a rebuild, massively increasing the chance of a multi drive failure and a lost array. If one close-to-EOL drive dies, putting its sister drives through what is essentially an all day full tilt stress test is a terrible, terrible idea, and this idea keeps getting worse (takes longer) as drive sizes grow.

raid 0+1 sidesteps these issues mostly at a modest increase in drive count, its a no brainier for most setups.

3 comments

Data inflight at write time (power/hardware failures are more likely to corrupt the array, especially silently, which is the worst outcome).

How is that? RAID doesn't affect data persistence behavior in any meaningful way. FUA/SyncCache/etc are supported by RAID controllers same as the underlying disks in writeback enviroments, parity updates included. Put another way, if you FUA or flush the writeback cache, those operations won't complete in a properly implemented RAID environment until the data is persisted somewhere, even if that means passing FUA down to the underlying storage. Granted there are a number of ways to mess this up, RMW cycles in a controller that doesn't have some kind of persistent memory and flush on power restore. Anyway, none of this is any worse than what happens in any other WB cached storage technology.

Finally, all this fearmongering about loss on rebuild is also something that should be more fully explored in the context of the fact that decent RAID systems run background scrub operations on a regular basis. Those operations by themselves are going to "stress test" the array on a regular basis when its consistent and not degraded. I've actually got a fair amount of experience in this area, and I'm here to tell you that if you think this is a risk consider what happens to non-raided unscrubbed drives that have a lot of data silently bitrotting on the platters. That latter effect is nearly always the problem in RAID environments when someone starts a rebuild on drives/sectors that have been unread for extended periods of time. But, in the case of RAID, a properly implemented system won't fail a drive for a single read failure during a rebuild, instead reconstructing from the other drives and leaving the drive online long enough to complete the rebuild and then taking it offline.

Basically raid 1 setups don't actually fix any of these problems, except through the use of massive additional parity disks overhead. Overhead that can also be applied to other RAID algorithsm to much better effect. AKA a mirrored RAID 6 provides far more protection than a mirrored raid 0. Similar levels can be had with 6+6 in environments where that is possible, with trivial capacity overhead.

Raid 5/6 require parity calculations before data can be written to disk. This is a significant amount of data, especially at high writing speeds. That is what causes the inflight data problem.

Battery and flash backup on controllers dosen't fix the problem of hardware failure (which is significant, especially on big hot controllers.

Again, decent controllers have ECC protection and the like, and frequently are available in HA configurations if your worry is controller failure (along with redundant/dual data paths to the media via SAS/NVMe/etc). Plus, there are a long list of technologies that can be enabled at the HBA layer and pushed all the way to the media (T10 DIF/DIX comes to mind).

But much of this micro level redundancy is overkill as frequently one uses some kind of application level HA/redundancy as well. So, loss of a RAID5/6 disk in a single machine is the functional equivalent of loss of a any combination of RAID 0/1 in the same machine. You still need the higher level redundancy as well as a backup plan.

We could start breaking the discussion up into fabric attached vs direct attach RAID vs Software, but I think its sufficient to say, that RAID5/6 doesn't _increase_ the failure surface in any meaningful way when your not using fly-by-night RAID.

Edit: Maybe what your trying to say is that cache flush/FUA operations for a give piece of data don't cover the parity calculation and buffers? That is false, a controller should not be responding to FUA/etc until the entire (including the parity) block has been persisted. So if the controller dies during the operation the host OS is fully aware that the operation didn't complete. The given block is of course left in some unknown state in this case, but that is true of any write operation that fails like this, regardless of WT/WB/RAID/etc.

The biggest problem with raid5 is that it is completely unprotected against silent corruption -- because there is no way for raid to know which data is the corrupted one (and as a result it has to decide whether the parity is correct or not -- though on most raid implementations just ignore silent corruption completely and so the parity is always assumed to be wrong in such cases).

So even if you rebuild an array, a bad drive might've blown away all of your data already. If you were to compare this with ZFS' "raid" Z1 (same parity, different design) you get detection and protection against silent data corruption.

>through what is essentially an all day full tilt stress test is a terrible, terrible idea

The rebuild isn't putting the disks under stress. The sister drive has already failed silently but you only notice this once you start the rebuild. The solution is to check the disks once a week by fully reading every sector.