Hacker News new | ask | show | jobs
by tke248 1234 days ago
I have a theory that solar flares are the cause of some hard drive failures would be interesting to see if a few lead shielded cases would reduce the number of failures. I used to manage a large fleet of computers and anytime we got radio interference from solar flares we would have 3-5 hard drive failures that day.
3 comments

Any sufficiently large cluster is effectively a cosmic ray detector with terrible sensitivity.
What were the failure modes? Was it corrupted data or were the drives permanently fried?
Corrupt data that would compromise the Operating systems, these were Dell computers with multiple different branded hard drives they would have us run their hardware diagnostic tool that would put them in the range to receive a free replacements. We didn't have to send back the old ones under the contract we had with them they would still work when reformatted but were less reliable after that.
So far I always assumed that, when talking about HDDs failing rates, they where considering the typical mechanical failure. I never considered that they could declare a failure due to some corrupted data, although it would be reasonable for a datacenter to do so.
Mechanical failures were more prevalent in my experience in 90s, most of the recent stuff is usually see are controller failures. I rarely hear any head crash clicking like the old days
If you didn't disable the write cache on those drives, flares could have caused bit flips in the cache memory before it was flushed to disk.
Out of what fleet size?
Around a 1000