Hacker News new | ask | show | jobs
by sliverstorm 4950 days ago
I, for one, replace memory modules as soon as they trigger more than one ECC event.

I thought ECC events were triggered by environment, rather than hardware faults? Or you just figure some sticks are by chance more susceptible?

2 comments

ECC events are triggered by any memory error, be it the occasional cosmic ray or a not so good memory module.

It isn't difficult to tell these two possibilities apart. Sometimes I get an ECC event on some server, and then it never happens again (or it happens in a different module), which doesn't warrant a replacement. Now, if the same module triggers another event, what's the chance of two "cosmic rays" hitting the same module twice and flipping a bit on it? It's better to just replace it (which is covered by warranty or maintenance contracts, so it costs us no additional charge).

Manufacturing memory from silicon wafers is similar to baking cookies. Some cookies are great, some turn out OK, and some are burnt depending on the characteristics of the ingredients, the oven, and the chaotic thermodynamic properties of the system.

So, yes, yield varies.