|
|
|
|
|
by kdkeyser
2787 days ago
|
|
CRUSH is an example (and not the first) of a "distributed rebuild" approach: you have an array of N drives (with N large, e.g. 100), and if 1 drive fails, you read in parallel from all (N-1) remaining drives, while distributing the reconstructed data across the remaining available capacity of all (N-1) remaining drives. In effect, you get the total bandwidth of (N-1) HDD's working in parallel. And the bandwidth of 100 HDD's doing sequential IO in parallel is really massive ( ~ 10 GB/s). Examples of companies claiming to use this approach are Qumulo (rebuild in couple of hours), Infinidat (couple of 10's of minutes), ClusterStor GridRAID (now part of Seagate I think), or "Declustered RAID" in GPFS (IBM) |
|
Thanks for pointing out that declustered/distributed rebuild RAID has many historical precedents (also 3PAR BTW) pre-CRUSH/Ceph.