|
|
|
|
|
by AnthonyMouse
1206 days ago
|
|
ECC is error correcting. A bit gets flipped and it not only detects it but fixes it. Two bits get flipped and it can at least detect it and panic the machine immediately instead of corrupting your data. Without it the corruption is silent. Then this kind of thing happens: https://news.ycombinator.com/item?id=35026440 Which is another reason not to solder the storage either. Suppose you have a system board with bad soldered memory and you want to copy your data off of it onto the new one. Well, the memory is flipping random bits as it's copying, but the flash chips are permanently attached to the same board as the bad memory. Otherwise it would have been just a support ticket; now it's something worse. |
|
I did neglect to mention that ECC by-definition can correct errors, but I wonder if what's making people upset with my comment is the implication that ECC can't detect all errors.
But it's true: ECC can't detect all bitflips, and in fact there's at least one study[1] that suggests quite a lot of memory errors go entirely undetected even with ECC.
Silent corruption does in fact occur even with ECC and it may not even be particularly rare, even though it is rarer than typical single/double-bit flips. Of course, the majority of desktops use non-ECC RAM and it's mostly fine, so I assume this is only ever going to matter in production workloads, and exactly what impact it has is hard to gauge.
[1]: https://pages.cs.wisc.edu/~remzi/Classes/739/Fall2018/Papers...