Hacker News new | ask | show | jobs
by lazyjones 2240 days ago
http://cr.yp.to/hardware/ecc.html

Without ECC, modern systems (>= a few GB RAM) will have bits flipping pretty much daily.

https://news.ycombinator.com/item?id=1109401

2 comments

This does not seem to be true in practice. I use ECC on my systems and they are configured to log any ECC error. And I see ECC erros almost never. Mostly ECC errors start appearing on very old systems when something in hardware becomes bad because of the age (even if just oxidized contacts). I am not sure if I've ever seen a truly random ECC error.
What do you mean by "very old"? This study, which is the only comprehensive public data of which I am aware, says that onset of DRAM errors occurs after 10-18 months.

https://storage.googleapis.com/pub-tools-public-publication-...

I've read such studies with great interest, and i prefer ECC whenever possible, but i think these don't necessarily apply to desktops, at least partially.

I think the environment in racks for nodes in whichever formfactor in racks is toxic. Be it EMF interference, power distribution issues, vibrations, and/or temperature. You don't have that in a single system at home, when it is built in a good way, and has stable power. Or at least to a lesser degree.

Very old -- like 10+ years old.
It's not quite fair to just extrapolate numbers for 256MB of RAM up to a modern system. If this was true you'd be seeing OS crashes daily (if you run hundreds of servers with no ECC you will see mysterious crashes every day, but for one server it might be once a year).

Ultimately these flips are caused by charged particles hitting the memory module (a so called 'single event upset' or SEU), and the number of charged particles hitting the modules has not increased with density (although the modules are more sensitive to SEU, it's a smaller effect).

To encounter a crash, bit flip has to happen on a very small subset of available memory. It's much more likely to just corrupt file cache.
Well it depends what you're doing, if your memory is full of hash tables and linked lists then you'll likely get a crash (say a web backend). If you're a fileserver and it's all cache then you won't.