Hacker News new | ask | show | jobs
by toast0 205 days ago
> I guess the difference being that people expect the HDD to fail suddenly whereas with a solid state device most people seem to be convinced that the failure will be graceful.

This is exactly the opposite of my lived experience. Spinners fail more often than SSDs, but I don't remember any sudden failures with spinners, as far as I can recall, they all have pre-failure indicators, like terrible noises (doesn't help for remote disks), SMART indicators, failed read/write on a couple sectors here and there, etc. If you don't have backups, but you notice in a reasonable amount of time, you can salvage most of your data. Certainly, sometimes the drives just won't spin up because of a bearing/motor issue; but sometimes you can rotate the drive manually to get it started and capture some data.

The vast majority of my SSD failures have been disappear from the bus; lots of people say they should fail read only, but I've not seen it. If you don't have backups, your data is all gone.

Perhaps I missed the pre-failure indicators from SMART, but it's easier when drives fail but remain available for inspection --- look at a healthy drive, look at a failed drive, see what's different, look at all your drives, predict which one fails next. For drives that disappear, you've got to read and collect the stats regularly and then go back and see if there was anything... I couldn't find anything particularly predictive. I feel disappear from the bus is more in the firmware error category vs physical storage problem, so there may not be real indications, unless it's a power on time based failure...

3 comments

For what it is worth the SMART diagnostics and health indicators have rarely been useful for me, either on SSDs or HDDs. I don't think I've ever had a SMART health warning before a drive dies. Although I did have one drive that gave a "This drive is on DEATH'S DOOR! Replace it IMMEDIATELY!" error for 3 years before I finally got around to replacing it, mostly to avoid having my OS freak out every time it booted up.
Oh, the overall smart status is mostly useless. But some of the individual fields are helpful.

The ones for relocated sectors, pending sectors, etc. When those add up to N, it's time to replace and you can calibrate that based on your monitoring cycle and backup needs. For a look every once in a while, single copy use case, I'd replace around 10 sectors; for daily monitoring, multiple copies, I'd replace towards 100 sectors. You probably won't get warranty coverage at those numbers though.

Mostly I've only seen the smart status warning fire for too many power on hours, which isn't very useful. Power on hours isn't a good indicator of impending doom (unless there's a firmware error at specific values, which can happen for SSDs or spinners)

> Spinners fail more often than SSDs, but I don't remember any sudden failures with spinners

I've had a fair numbet of HDDs throughout the years. My first one, well my dad's, was a massive 20 MB. I've had a 6+ disk ZFS pool going 24/7 since 2007. Oldest disks had over 7 years on-time according to SMART data, replaced them due to capacity.

Out of all that I've only had one HDD go poof gone. The infamous IBM Deathstar[1].

I've had some develop a few bad blocks and that's it, and one which just got worse and worse. But only one which died a sudden death.

Meanwhile I've had multiple SSDs which just stopped working suddenly. Articles write about them going into read-only mode but the ones I've had that went bad just stopped working.

[1]: https://en.wikipedia.org/wiki/Deskstar#IBM_Deskstar_75GXP_fa...

My experience has been the same. Hard drives fail more gracefully than SSDs.

> The vast majority of my SSD failures have been disappear from the bus; lots of people say they should fail read only, but I've not seen it. If you don't have backups, your data is all gone.

I just recovered data a couple weeks ago from my boss's SATA SSD that gave out and went read-only.