Hacker News new | ask | show | jobs
by cronz 2810 days ago
Most people don't have that amount of RAM. What are the chances that a soft bit flip will cause silent data corruption that matters for a user with, say, 8 GB of RAM?
6 comments

The answer to the question you've asked depends heavily on the particular user and their work, of course. If the user is just playing games, who cares? But maybe they're doing financial calculations or compiling software that lots of users run or maybe they're doing aircraft structural analysis.

In any case, bit flips are much more common than were suspected: https://arstechnica.com/information-technology/2009/10/dram-...

I believe strongly that ECC should be standard, because you can't safely assume that your users are doing worthless work. Apple got this right on (non-Mini) desktops a long time ago. Not yet on laptops, unfortunately.

If you’ve got files you care about then care about bit rot. I want ECC so my ZFS volumes don’t silently corrupt.
That study is faulty. The intern who did that study didn't know that Google would buy DRAM chips that failed manufacturers QA, but them on DIMMs themselves, and retest them at lower frequencies and with ECC turned on. When they already have to be tolerant of any node failing because of their scale, they can start playing fast and loose with this sort of thing if it makes financial sense.

EDIT: At -3 so far, does anyone want to explain the downvotes? I saw the google slides first hand, and there are comments from 2009 in that article saying the same thing.

Didn't down it either, but: "The intern" is a well-respected CS professor; the paper had two Google authors whom one would expect to have knowledge of Google's oddities. https://ai.google/research/pubs/pub35162

Your comment provided no substantiation of your claim, merely hand-waving, while casting aspersions on someone else's work.

All ram should be ecc. No one would accept this incorrectness BS if the precedent hadn’t been set by the monopolist.
It's really the memory manufacturers who are enforcing and profiting from the ECC shakedown though, AFAICS.
Intel probably gets more from selling a Xeon.
I didn't downvote, but I'd guess that they (a) want an authoritative reference to the story you're sharing, or (b) figure the point is moot because these errors still happen, even if less often than depicted by the study, and that ought to be enough to justify using ECC.

Like I said though, just a guess.

If you think bit flipping is rare, check out this write-up: http://dinaburg.org/bitsquatting.html

Bitsquatting: DNS Hijacking without exploitation

When bit-errors occur they can change memory content. Computer memory content has semantic meaning. Sometimes, that meaning will be a domain name. And applications utilizing that memory will use the wrong domain name.

Nice point. As with all things that follow a distribution and given the large number of machines this error will occur.
Run a filesystem that checksums data (e.g. ZFS) on a system without ECC RAM for a few weeks and then come back to us... I've done it, the results were surprising, and I replaced the RAM with ECC.
> Most people don't have that amount of RAM.

Most people don't buy desktops with 16 or 32 or 64-thread processors designed to maximize throughput.

Those who do tend to want to max out how much RAM they can shove in their box.

As a counter example, my desktop has had 128 gb of ram since two years ago. I've never needed that much, 64GB would have been enough for the number of vms and containers that I run. But I couldn't pass up a good deal. I got the ram for $600.
Real statement of the times that I had a tough time finding 128 gig ram sets (on a short search - this should be easy).
Bit flip in the dirty cache that gets written to disk? Depends on what data structure was corrupt...