Hacker News new | ask | show | jobs
by martinced 4913 days ago
I'm not a Unx sysadmin at all and don't know much about hard drive: I'm just a software dev.

But from the beginning of TFA, after reading this:

"Bad blocks. Two of them. However, as the blocks aren't anywhere near the active volumes they go largely undetected."

The FIRST* thing that came to my mind was: "What!? Isn't that a long-solved problem!? Aren't disks / controllers / RAID setups much better now at detecting such problem right away".

I've got a huge issue with the "largely undetected". I may, at one point, need storage for a gig I'm working on. And I certainly don't want stuff problems like that to go "largely undetected".

So quickly skipping most of the article and going to the comments:

"It's worth pointing out that many hardware RAID controller support a periodic "scrubbing" operation ("Patrol Read" on Dell PERC controllers, "background scrub" on HP Smart Array controllers), and some software RAID implementations can do something similar (the "check" functionality in Linux md-style software RAID, for example). Running these kinds of tasks periodically will help maintain the "health" of your RAID arrays by forcing disks to perform block-level relocations for media defects and to accurately report uncorrectable errors up to the RAID controller or software RAID in a timely fashion."

To which the author of TFA himself replies:

"Yes, that is something I should have made clearer. This is the very reason that RAID systems have background processes that scan all the blocks."

Which leaves me all a bit confused about TFA, despite all the shiny graphs.

Basically, I don't really understand the premises of "bad blocks going largely undetected" in 2013...

4 comments

I dealt with this exact problem for a number of years. Background scrubbing takes away I/O resources and can be a disaster on your workload if you rely on sequential reads/writes. For that reason, most controllers are configured by default to only scrub when the disk is totally idle which is never. Even if the controller had a better definition of idle, scrubbing an entire disk to find those rotten bits would take a long long time, a disk would almost certainly fail before that.
I use the built in SMART full disk check. It's quite good at only reading when the disk is idle, and it checks the entire disk.

A quick self test every day for all disks, and a long (i.e. full read) self test once a week.

The RAID is then checked on top of that one a month (although that slows things down a bit).

With sufficient redundancy available, could you temporarily take a drive out of the RAID for scrubbing, and then add it back in when you're done, to avoid conflicting with ongoing work and destroying linear access patterns?
The rebuild would be worse than the scrubbing.

A better plan is to light up your disaster recovery plan weekly, and while the DR system is handling the load, scrub to your hearts content on the down system.

Depending on the cost of your hardware vs the cost of your labor vs the cost of downtime, dual servers, one flagged as production and one flagged as development, alternate flags every weekend, might work out. You'll hear lots of bragging about that not being possible because the hardware is too expensive, not so much bragging about labor cost and downtime cost. I worked at financial services corp about two decades ago where downtime was supposedly in excess of $1M/hr. They had triple mainframes set up, basically three machine rooms inside the machine room.

The disks themselves (new ones at least) have a background scan process (called BMS or BGMS) which helps considerably. The one thing the disk can't do by itself is correct unrecoverable errors since by the definition the disk can't recover from them :-)

The combination of BMS and disk scrubbing at the RAID level should handle almost all of the issues that are pointed by the original post.

Though RAID scrubs can and do take a long time to complete, depending on the performance impact that you are willing to suffer on a continuous basis it can take a week or two to perform proper scrubbing.

Proper scrubbing would include not just reading the RAID chunk on a disk but to also read the other associated chunks from the other disks and verify that the parity is still intact. In RAID5 you will not be able to recover if the parity is bad as you won't know what chunk has gone bad.

I've been coding such systems for a while now and as a shameless plug would point to http://disksurvey.com/blog/ if there are things of interest I'd be happy to take requests and write about them as well.

I have a home server with three disks and ZFS, for my photos and things, so I'm not an expert. However, Ubuntu's md-raid includes scrubbing once a week by default, and I added scrubbing to my ZFS setup via crontab, again once a week (I'm not sure if ZFS does it automatically, but I don't think it does. I would appreciate a correction, if someone knows for sure).

The article assumes no scrubbing, which is a stupid thing to run without, as detailed from the article. So it's basically "why pointing a gun at your foot and pulling the trigger is bad", "because you're going to shoot yourself in the foot".

The article describes why scrubs don't happen often enough: it's slow and disruptive. I have a 3-way RAID-1 /home partition (long story) and it's checked on the first Sunday of the month. I always remember this because I can tell from the performance of my workstation that something is up with the disk. This is with operations like a single thread running "ls". If you're running a production service, you're also going to notice, and you're also going to have more than 3TB of drive to scan. That makes running regular scans rather difficult.
You may add something like this to /etc/periodic.conf:

daily_status_zfs_enable="YES" daily_scrub_zfs_enable="YES" daily_scrub_zfs_default_threshold="6" # in days

and it will scrub the pools every 6 days (and send you a report in the daily run output).

Very nice, thank you, I will try that. I am rather dismayed, however, by learning in this thread that my disks have 4K sector sizes and ZFS autodetected 512 bytes, which means I'll have to destroy the pool and recreate it...
It happens all the time...

If you run "camcontrol identify ada0" (or whatever your device is) you can find out before it is too late:

sector size logical 512, physical 512, offset 0

This is from a lucky drive of course :)

Hmm, there's no such command in Ubuntu, maybe it's from BSD?
camcontrol is from FreeBSD.

I don't have a Linux box available right now but maybe "hdparm -I" does something similar: "request identification info directly from the drive".

ZFS doesn't do it automatically, you have to crontab it. I have it crontabbed for the 1st or the 15th.
On 3ware(LSI) controllers you can schedule a 'verify' task to run on a schedule. I do believe it is on by default, but I could be wrong. It is good to tell admins about this being that an uninformed one may turn the settings off without realizing what can occur if it is disabled.