Hacker News new | ask | show | jobs
by sschueller 4911 days ago
I have had a multi-disk failure occur with a RAID-1 setup. Server was pre-built from a large vendor and worked fine until both disks failed at the exact same time (within minutes).

Took the disks out to find that they had sequential serial numbers.

Called vendor for replacement only to have them tell me that they had issues with that batch, yet did not make any attempt to inform me.

Spent the day restoring from tape backup.

TLDR: If you buy a pre-built server check that the disks aren't all from the same batch.

2 comments

This is a problem even if you don't buy pre-built. You're going to be buying similarly specced drives at similar times and you're probably buying from vendors from the same rough geographical area so chances are you're buying drives from the same batch anyway.

It used to be worse: all the drives in a RAID setup had to have the exact same specifications or the thing wouldn't work, which pretty much guaranteed near simultaneous failure of multiple drives, but even today, with somewhat more flexible software raid setups, it's still a problem.

At a place I used to work we used to joke that a drive failure warning from a RAID controller was nothing more than a signal to get out the backup tapes and start building a new server.

I also had a RAID-5 array fail due to 3/4 drives failing near-simultaneously. All of the drives were from the same batch. Some months later, a friend came to me with a computer problem. Her drive had failed. I was able to take a look at the drive and, to my amazement, the drive was from the same batch! Based off of a wild hunch, I swapped the controller board from my one remaining good drive into my friend's drive. The drive worked fine and I was able to recover all her data! It is interesting that the drive failures were likely due to the fact that the drives were from from the same batch but also that fact probably allowed me to seamlessly swap the controller boards!