Hacker News new | ask | show | jobs
by jmpman 2787 days ago
CRUSH algorithms are used to overcome rebuild limits in modern arrays. https://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
1 comments

CRUSH is an example (and not the first) of a "distributed rebuild" approach: you have an array of N drives (with N large, e.g. 100), and if 1 drive fails, you read in parallel from all (N-1) remaining drives, while distributing the reconstructed data across the remaining available capacity of all (N-1) remaining drives.

In effect, you get the total bandwidth of (N-1) HDD's working in parallel. And the bandwidth of 100 HDD's doing sequential IO in parallel is really massive ( ~ 10 GB/s).

Examples of companies claiming to use this approach are Qumulo (rebuild in couple of hours), Infinidat (couple of 10's of minutes), ClusterStor GridRAID (now part of Seagate I think), or "Declustered RAID" in GPFS (IBM)

GridRAID is owned by Cray now, who were the primary OEM from Seagate.

Thanks for pointing out that declustered/distributed rebuild RAID has many historical precedents (also 3PAR BTW) pre-CRUSH/Ceph.