|
|
|
|
|
by uiohnuipb
6105 days ago
|
|
And try and convince a programmer that it's possible that their program's memory can be wrong. They understand in theory but refuse to code for the possibility. Especialy when you get into HPC and there are clusters of 50-60 machines with 4Gb each, the chance of not having corrupt memory is almost 0. |
|
Because the hardware can still detect multi-bit errors, just not transparently correct them. So you shut the machine down automatically until you get new dram installed.
Programmers _are_ coding for machines-will-fail-temporarily, but coding to "handle" random memory errors instead of buying the right ECC hardware would be insane.