Hacker News new | ask | show | jobs
by ahelwer 1989 days ago
That photo is from him saying "fuck you" to NVIDIA back in 2012: https://arstechnica.com/information-technology/2012/06/linus...

FTA: "Bit flips can happen for many reasons, beginning with cosmic-ray impact or simple hardware failure. A large-scale study[0] of Google servers found that roughly 32 percent of all servers (and 8 percent of all DIMMs) in Google's fleet experience at least one memory error per year. But the vast majority of these are single-bit errors—and since Google is using server CPUs and ECC RAM, this means the machines in question keep right on trucking."

[0] http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

2 comments

Something that doesn't get said enough is that Google had a habit at the time of buying RAM chips that had failed manufacturer QA, stuck them on DIMMs themselves, and revalidated those DIMMs. They were really leaning into the whole "embrace failures if they're going to happen anyway and you can get cheaper servers out of it". So those numbers need to be taken with a grain of salt.
Even still, a one-third of a memory error per year is a rate that I'm totally comfortable with.
Also FTA: "ECC RAM ... can generally stop Rowhammer attacks—in which rapidly flipping bits in one area of RAM cause bits in an adjacent area to change."