Hacker News new | ask | show | jobs
by Ologn 4911 days ago
Yes. The decision to use RAID5 instead of RAIDs 1, 10 or 0+1 shows a decision to cut costs. With mirroring, the mirror drives would take over when a disk failed.

Also, only one hot spare is in the set. Another cost saving.

Yet another decision - RAID5 across 7 disks plus a hot spare. Instead of say across 6 disks or 5 disks. You have two more chances for a disk to go bust and will have to be rebuilt from parity.

What if the disks are OK but the server host adapter card gets fried? Or the cable between the server and array? Some disk arrays allow for redundant access to the array, and some OS's can handle that failover.

Before I read the article, I thought it might discuss heat. Excessive heat is usually the cause when disk arrays start melting down one after another. Usually the meltdown happens in an on-site server closet/room which was never properly set up for having servers running 24/7. Usually the straw which breaks the back is a combination of more equipment added, and hot summer days. Then portable ACs are purchased to temporarily mitigate things, but if their condensation water deposits are not regularly dumped, they stop helping. This situation occurs more than you would imagine, luckily I have not been the one who had to deal with this for every time I have seen it (although sometimes I have). Usually the servers are non-production ones which don't make the cut to go into the data center.

The heat problem happens in data centers as well, believe it or not. A cheap thermometer is worth buying if you sense too much heat around your servers. Usually the heat problem is less bad, but the general data center temperature is a few degrees higher than what it should be, and this leads to more equipment failure.

3 comments

Hard drives are pretty resilient to high temperatures. Google did a reliability analysis of thousands of hard drives and found:

"Overall our experiments can confirm previously reported temperature effects only for the high end of our temperature range and especially for older drives. In the lower and middle temperature ranges, higher temperatures are not associated with higher failure rates. This is a fairly surprising result, which could indicate that datacenter or server designers have more freedom than previously thought when setting operating temperatures for equipment that contains disk drives. We can conclude that at moderate temperature ranges it is likely that there are other effects which affect failure rates much more strongly than temperatures do."

http://research.google.com/archive/disk_failures.pdf

I'd love to see the same research for SSDs.

Also, parent post does talk about higher end temps, not middle range temps.

Personally I've not had much luck with hot-spares, I'd prefer to have the spare in the array (in the case of RAID-6) so I can find out if there's a problem with that drive before it's the only thing standing in the way of total failure.

I'm a fan of the temperature@lert USB sensors; $130 gets you peace of mind: http://www.temperaturealert.com/

>With mirroring, the mirror drives would take over when a disk failed.

The mirror drives that also have bad sectors? Then you go even longer without noticing you have problems.