|
|
|
|
|
by jamesblonde
3695 days ago
|
|
Raid5 is not dead yet (https://www.cafaro.net/2014/05/26/why-raid-5-is-not-dead-yet...). The problem with failures during rebuilds is overblown, IMO. Manufacturer quoted URE failure rate (probability of failure to read) is overstated - instead of 1×10^14, they are mostly like 1×10^15 or higher.
Full disclosure: we're actually doing erasure coding in HDFS over Raid5 on servers (double insurance - if the raid array goes down, we can recover from other servers in HDFS). But our expectation for 6x4TB arrays is not for a 70%+ chance of a URE during a rebuild, rather a couple of percent. With ZFS or btrfs, it won't actually matter for us, as we'll only lose a block on a URE- that we can recover from the rest of the cluster. |
|
I've got roughly 30 arrays in production, between 4 and 12 disks in each. All are RAID5 + hotspare. If you believe the maths people keep quoting, the odds of seeing a total failure in a given year is close to 100%. I started using this configuration, across varying hardware, over 15 years ago and I've been growing in number since.
I'm not pretending one example proves the rule, or that it's totally safe and I would run a highly critical environment this way (before anyone comments: these environments do not meet that definition), but people have tried to show maths that there's a six nine likelihood of failure, and I just don't for a second believe I'm that lucky.