| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ChuckMcM 4280 days ago

The non-recoverable bit error rate spec.

NetApp tracks it with their Nearstore product line which used SATA drives in a NAS box (they have been for a while actually, when I left they had data on about 65 million drive hours) and while Seagate quotes it a 1x10^15 bits but its actually closer to 5 in 10^15 bits. A 3TB drive has 3x10^13 bits of data (closer to 3x10^14 when you account for track markers and error recovery bits).

If you're bored some time try reading every sector from one of these drives. To maximize your chance of success make sure you operate the drive at a slightly warm temperature (keeps the lubricant from sticking) and isolate it from vibration. Its worse if you read it randomly (you will get some arm servo movement just because the drive will have replaced some blocks from spares, but minimizing it also keeps vibrations down.)

Long before it became an issue on single drives, like it is today, it was an issue when trying to reconstruct a RAID4 (or 5) group that was 3.5TB (which at the time was a 7 disk raid group of .5T drives. 14 disk groups (a full shelf) were pretty much guaranteed to see a second error in the shelf during reconstruction. Which was also way RAID6 or dual-parity RAID became a must have enterprise feature back in 2005 or thereabouts.

On an interesting side note, because the chance of hitting an unrecoverable read error is evenly distributed through a drive, 3X replication is still recoverable even with intermittent read failures. There isn't really a RAID number for that but it does work reasonably well and avoids a pesky parity calculation if you embed check data in your blocks as they do in GFS.

[1] https://www.usenix.org/legacy/publications/library/proceedin... -- Peter Corbett's paper (he is the guy who invented NetApp's dual parity system, and from that paper the following --

"Disks protect against media errors by relocating bad blocks, and by undergoing elaborate retry sequences to try to extract data from a sector that is difficult to read [10]. Despite these precautions, the typical media error rate in disks is specified by the manufacturers as one bit error per 1014 to 1015 bits read, which corresponds approximately to one uncorrectable error per 10TBytes to 100TBytes transferred. The actual rate depends on the disk construction. There is both a static and a dynamic aspect to this rate. It represents the rate at which unreadable sectors might be encountered during normal read activity. Sectors degrade over time, from a writable and readable state to an unreadable state."

And in experience from the field put it at about 15TB transferred, so 3TB into 15TB, one in five.

2 comments

ryao 4280 days ago

3TB is 310^12 bytes assuming the decimal bytes used in the storage industry. The uncorrectable bit error rate is for the raw block storage. It does not include the low level formatting, which is no more than 20% of the storage on 512-byte sector drives and less than 10% on advanced format drives. The probability of an uncorrectable bit error when copying 3TB using decimal bytes) is approximately 1.5% under the assumption of a 5 in 10^15 uncorrectable bit error rate:

[1 - (1 - 5 10^-15)^(3 * 10^12)] ~ 0.01488...

If your 20% figure is accurate, the actual uncorrectable bit error rate would need to be something like 7 in 10^14. I am not disputing your empirical information, but your numbers are do not agree with it. The difference in what your numbers say and what you say is only about 1 order of magnitude. Doing statistical calculations with better records could allow the cause of that to be identified.

link

ChuckMcM 4280 days ago

And to be clear, it is a bit error rate not a byte error rate. Nominal coding of data in magnetic media is 10 bits per 8 bit byte although a specific drive may use a different encoding on the platter. The Barracuda included 5120 NRZ encoded bits per sector and a 48 bit NRZ encoded checkword giving it a nominal 10.094 bits per byte. You're off by one decimal order of magnitude in the number of bits.

link

ryao 4280 days ago

Just to be clear, I meant 3 * 10^12, not 310^12. The arithmetic that I posted uses the correct number.

link

lutusp 4280 days ago

To avoid markdown, either use reverse-slashes to escape your asterisks in paragraphs, or surround them with spaces, or put four spaces to the left of short lines that have "special" characters.

link

feld 4277 days ago

So every time you do a zfs scrub on a large pool (many TB) you should see errors that are detected and corrected.

But you don't...

link