Hacker News new | ask | show | jobs
by kevin1024 4401 days ago
> Do not use raidz1 for disks 1TB or greater in size.

Oops, I'm doing that on my home NAS. Does anyone know why this is bad?

3 comments

I wish the author had provided an explanation. The main issue I'm familiar with is that throughput and IOPS capacity generally don't increase linearly with storage capacity, so the time to recover from a drive failure increases significantly with larger drives. The author may be saying that you should use raidz2 or raidz3 with 1TB drives because the time to resilver 1TB is long enough that the odds of sustaining another drive failure with raidz1 are too high, or alternatively that you should use 750GB or smaller drives with raidz1 to keep the resilver times lower in order to reduce the odds of a second failure during resilvering).
It is not due to the time for resilvering. It is due to the rated probability of a non recoverable 1bit (or more) read error on modern drives. This probability is high enough that you have a 32% chance of it on reading 1TB. However, this is actually less of a problem on ZFS compared to hardware raid because zfs will only read actual data, not blindly every sector.
HW RAID does not read every sector blindly, there is a level of error detection there. And an errored sector in one read does not mean it errors in every read.

Now, the error detection schemes at the disk level may be insufficient. I don't know enough about how it's done on modern drives (but I suspect that every manufacturer has its own scheme).

I am as well (5x 3TB in Raidz1). I'm pretty sure it's because of the likelihood of having an unreadable bit/byte/sector on one of the non failed disks gets higher as the capacity increases and thus there is a good chance that you'll lose some data. This article discusses the theory. http://www.zdnet.com/blog/storage/why-raid-5-stops-working-i...
Is there a way to check statistics of failed read/writes?
"zpool status" will show if there have been errors reading data from individual devices. If a drive experiences enough failures, at least on illumos and Solaris-based systems, it will be marked degraded or faulted and removed from service. You can view individual failures on these systems with "fmdump -e". Here's a made-up worked example: https://blogs.oracle.com/bobn/entry/zfs_and_fma_two_great
Here too. Ignoring the unsubstantiated statement in the article
The issue is the probability of a read error when reading the array to replace a failed disk is too high.

Hard drive capacities have been rising much faster than the unrecoverable read error rate has been lowering.