Hacker News new | ask | show | jobs
by cs702 5106 days ago
Very useful -- I will take this analysis into account when it's time to upgrade my current personal machine or configure the next one! Thank you for posting this here.

The only thing I would have wanted to see but didn't in this analysis is how failure rates vary for different types of disk subsystem -- specifically, traditional hard drives versus the newer solid-state devices. I suspect, but don't know for sure, that the latter have much, much lower real-world failure rates in the first 30 days of total accumulated CPU time (TACT).

The authors openly suggest that the sharp difference in failure rates between desktop and laptop machines may be due in part to their disk subsystems: "Laptops are between 25% and 60% less likely than desktop machines to crash from a hardware fault over the first 30 days of observed TACT. We hypothesize that the durability features built into laptops (such as motion-robust hard drives) make these machines more robust to failures in general." Alas, the authors don't delve any further into it.

I'd like to see hard data comparing the real-world failure rates of both desktops and laptops using traditional versus solid-state disk subsystems.

4 comments

So far numbers I've seen seem to acknowledge a failure rate of SSDs in the same ballpark as spinning rust but it's only been slightly over a year that most SSDs are actually reliable. Many, many old models were absolutely terrible. Hence I think it may be difficult yet to draw reliable conclusions.
wazoox: thanks. Do you recall the source(s) for those numbers?
So far, the best source I can recall is this study: http://www.tomshardware.com/reviews/ssd-reliability-failure-...
That study is Intel-only, but it seems to jive with the return rates Anandtech mentioned: http://www.anandtech.com/show/4202/the-intel-ssd-510-review/...

There are also numbers for hard drives: http://forums.anandtech.com/showthread.php?t=2147063

Basically, Intel SSDs from a year ago are more reliable than all hard drives. And SSDs in general are more reliable than any 2TB hard drive.

The data isn't ideal, but it's better than anecdotes. Return rates should correlate pretty well with failure rates. If anything, return rates should favor hard drives, since people are less likely to return a faulty cheap hard drive than a faulty expensive SSD.

AngryParsley: the Tom's Hardware article wazoox mentioned above also has those return-rate stats (page 3): "...returns can occur for a multitude of reasons. This presents a challenge because we don’t have any additional information on the returned drives—were they dead-on-arrival, did they stop working over time, or was there simply an incompatibility that prevented the customer from using the [device]? ... If online purchases account for the majority of hard drives sold, poor packaging and carrier mishandling can have a real effect on return rates. Furthermore, we also have no way of normalizing how customers used these drives. The large variance in hard drive return rates [between data sets] underlines this problem. For example, the Seagate Barracuda LP rises from 2.1% to 4.1%, while the Western Digital Caviar Green WD10EARS drops from 2.4% to 1.2%..."[1]

In short, the available return-rate data is too noisy and inconsistent to be a good proxy for failure rates.

[1] http://www.tomshardware.com/reviews/ssd-reliability-failure-...

wazoox: thanks for that. Just read it.

My take: so far, no one has sufficient consistent-across-the-board data at the moment to reach a conclusion about the matter, but the anecdotal evidence presented in that article suggests that Intel SSDs probably have lower failure rates than most traditional and solid-state alternatives. I will keep that in mind next time I buy an SSD.

Since the database they used is from "a period of 8 months in 2008" (Section 4, "Measuring hardware failure rates") I doubt they had any significant number of solid-state disks in their data.
cs702, this is the article I looked at before buying my SSD recently: http://www.tomshardware.com/reviews/ssd-reliability-failure-...

It's talking specifically about disk failures though, not comparing whole systems

thechut: thanks. See my response here: http://news.ycombinator.com/item?id=4162362
Funny, I suspect that SSDs have much higher real-world failure rates. (My personal, limited, anecdotal evidence is that my 64 GB Crucial M4 SSD lasted about a year as the root drive in a busy Linux desktop, while I have a stack of about a dozen hard drives that have been retired due to being too small or too slow while still working fine.)

Lack of moving parts is great, but flash allows a finite number of write cycles.

How heavily were you using the laptop? Did it fail from running out of write cycles or something else?

Some people over on xtremesystems have done Endurance testing, and the 64 GB m4 took over 700TB of writing to for failure to occur, and 172 TB to reduce the MWI to 0.

In a little over a year I have only written 4.1TB to my SSD in my desktop. Write cycles are very unlikely to run out for me before I replace the drive.

http://www.xtremesystems.org/forums/showthread.php?271063-SS...

I don't have actual numbers, but it's my primary desktop at home, and it saw everyday use.
dripton: I hear you, but note that I'm not interested in write-cycle limits nor MTBF stats obtained in a lab setting; I'd just like to know the real-world rate of hardware failure of SSDs.
I think everyone would. Unfortunately there are financial reasons for SSD manufacturers to keep that information secret and contractual reasons why retailers cannot release it.

I think Intel released some information on the reliability of their SSDs a few years ago but that was likely because they knew they were doing best and their enterprise customers are very interested in that for their data-centre rollouts.

The very limited information I've seen suggests to me that a few years ago SSDs had a much higher failure rate than HDDs (double in the first 6 months) but that has been falling very quickly with each new generation of SSDs (and as the profit margin grows and manufacturers have to work on reputation to justify the markup).