Hacker News new | ask | show | jobs
by grue_some 1586 days ago
DDR5 contains two forms of ECC. The first is standard ECC which is used to correct for bit flips in transmission. The second on-die ECC is used to correct bit flips on the die, hence the name. The world has already accepted that standard ECC on high speed interfaces is a good idea, so why would on-die ECC be a bad idea? Yes, they correct different error types, but they both attempt to correct corrupted bits and the do so in a mathematically similar way.

All that said, there are still ECC (has an ECC memory) and no-ECC dimms for DDR5. So if the on-die ECC is concerning for anyone, they can still get a DIMM with a separate ECC memory. But the ECC happening at the interface between the DIMM and the CPU will still exist always and you will have to trust it.

2 comments

Again, going back to the discussions over on RWT: some of the less robust forms of ECC that DRAM manufacturers typically implement can end up amplifying the problem by turning double bit flips into silent multi bit flips which makes the memory controller's job much harder. DRAM manufacturing process tech is not optimized for logic like CPUs are, and those limitations really do constrain how much logic (or "how good") the ECC implemented on DRAM chips is. I trust CPU manufacturers to get memory controllers right more than I trust DRAM manufactures to get ECC right for one simple reason: row hammer.
The on-die ECC for DDR5 is typically:

* mandatory (an hypothetical DDR5 without could have error rates so high it would basically not work)

* an implementation detail (if the raw error rate was not that high, there would be no on-die ECC)

* not reported to the CPU

It's a complete different beast than real ECC. It's not that it is bad or concerning, it is that it does not provide RAS services and, like ECC-less DDR4, should be reserved for consumer electronics for basically only tasks like entertainment. Actually, in a better world most consumer electronic should have real ECC (instead of none at all or implementation detail on-die) -- but sadly for now vendors do not do that.