Hacker News new | ask | show | jobs
by magnetic 814 days ago
My SSDs show SMART attributes, which can be used as a rough indicator of health, but really the only strategy I've found to work well for my peace of mind is to use redundancy.

Concretely, I use ZFS with a zpool with 2 SSDs in a mirror configuration. When one dies, even if it's sudden, I can just swap it out for another one and that's it.

My vulnerability window starts when the first SSD fails and closes when the mirror is rebuilt. If something bad happens to the other SSD during that time, I'm toast and I have to start restoring from backup.

1 comments

Did you stagger the power-on times? Otherwise you could get tightly correlated failures.
They are about 25 hours apart, which isn't very large I'll admit.

Thankfully, the serial numbers aren't too close to each other, so I'm hoping they aren't part of the same batch.

In my experience with enterprise SSDs (which yeah aren't the same but that's what I have to offer), SSDs with sequential serial numbers and identical on-times, in the same RAID array, can have wildly different actual endurance. Some storage servers I used to admin had SSDs lasting longer than 2 neighbor replacements from the same original box, and this happened at least twice.

I stopped being worried about on-times after that for SSDs. HDDs are still quite correlated (on the order of months) but if you're building the server you have to put the disks in it at some point.