Hacker News new | ask | show | jobs
by fortran77 1246 days ago
One of the main reasons I buy Xeon desktops is the ECC. With 128 GB of memory, and 1 bitflip/GB/year average error rate, it seems too risky to not use ECC for production work.
1 comments

Real world numbers are closer to 1 bitflip/GB/hour than year because bit flips are highly correlated.

“A large-scale study based on Google's very large number of servers was presented at the SIGMETRICS/Performance '09 conference.[6] The actual error rate found was several orders of magnitude higher than the previous small-scale or laboratory studies, with between 25,000 (2.5 × 10−11 error/bit·h) and 70,000 (7.0 × 10−11 error/bit·h, or 1 bit error per gigabyte of RAM per 1.8 hours) errors per billion device hours per megabit. More than 8% of DIMM memory modules were affected by errors per year.” https://en.wikipedia.org/wiki/ECC_memory

A random stick of non ECC memory might be far above average or have several errors per minute, but you just don’t know.

That study is very old and is based on long-outdated DRAM tech. I suspect that DDR5 has much lower error rates.
I would welcome more recent data, but I doubt we are talking about a 4 orders of magnitude change to get to /year vs /hour error rates.
Actually, Samsung claimed a factor of a million lower error rate in DDR5 vs DDR4 due to the on-die ECC.

Source: https://www.anandtech.com/show/16900/samsung-teases-512-gb-d...

> The company details a 512 GB module of DDR5 memory, running at DDR5-7200, designed for server and enterprise use.

That just shows how useful ECC memory is not that these bit flips didn’t occur.

If you scroll down a bit, the article shows a slide from Samsung claiming that DDR5 improves error rates by a factor of 10^6.