| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by StillBored 2832 days ago

Data inflight at write time (power/hardware failures are more likely to corrupt the array, especially silently, which is the worst outcome).

How is that? RAID doesn't affect data persistence behavior in any meaningful way. FUA/SyncCache/etc are supported by RAID controllers same as the underlying disks in writeback enviroments, parity updates included. Put another way, if you FUA or flush the writeback cache, those operations won't complete in a properly implemented RAID environment until the data is persisted somewhere, even if that means passing FUA down to the underlying storage. Granted there are a number of ways to mess this up, RMW cycles in a controller that doesn't have some kind of persistent memory and flush on power restore. Anyway, none of this is any worse than what happens in any other WB cached storage technology.

Finally, all this fearmongering about loss on rebuild is also something that should be more fully explored in the context of the fact that decent RAID systems run background scrub operations on a regular basis. Those operations by themselves are going to "stress test" the array on a regular basis when its consistent and not degraded. I've actually got a fair amount of experience in this area, and I'm here to tell you that if you think this is a risk consider what happens to non-raided unscrubbed drives that have a lot of data silently bitrotting on the platters. That latter effect is nearly always the problem in RAID environments when someone starts a rebuild on drives/sectors that have been unread for extended periods of time. But, in the case of RAID, a properly implemented system won't fail a drive for a single read failure during a rebuild, instead reconstructing from the other drives and leaving the drive online long enough to complete the rebuild and then taking it offline.

Basically raid 1 setups don't actually fix any of these problems, except through the use of massive additional parity disks overhead. Overhead that can also be applied to other RAID algorithsm to much better effect. AKA a mirrored RAID 6 provides far more protection than a mirrored raid 0. Similar levels can be had with 6+6 in environments where that is possible, with trivial capacity overhead.

1 comments

throwaway2048 2832 days ago

Raid 5/6 require parity calculations before data can be written to disk. This is a significant amount of data, especially at high writing speeds. That is what causes the inflight data problem.

Battery and flash backup on controllers dosen't fix the problem of hardware failure (which is significant, especially on big hot controllers.

StillBored 2832 days ago

Again, decent controllers have ECC protection and the like, and frequently are available in HA configurations if your worry is controller failure (along with redundant/dual data paths to the media via SAS/NVMe/etc). Plus, there are a long list of technologies that can be enabled at the HBA layer and pushed all the way to the media (T10 DIF/DIX comes to mind).

But much of this micro level redundancy is overkill as frequently one uses some kind of application level HA/redundancy as well. So, loss of a RAID5/6 disk in a single machine is the functional equivalent of loss of a any combination of RAID 0/1 in the same machine. You still need the higher level redundancy as well as a backup plan.

We could start breaking the discussion up into fabric attached vs direct attach RAID vs Software, but I think its sufficient to say, that RAID5/6 doesn't _increase_ the failure surface in any meaningful way when your not using fly-by-night RAID.

Edit: Maybe what your trying to say is that cache flush/FUA operations for a give piece of data don't cover the parity calculation and buffers? That is false, a controller should not be responding to FUA/etc until the entire (including the parity) block has been persisted. So if the controller dies during the operation the host OS is fully aware that the operation didn't complete. The given block is of course left in some unknown state in this case, but that is true of any write operation that fails like this, regardless of WT/WB/RAID/etc.