Hacker News new | ask | show | jobs
by erichocean 4950 days ago
Although we use ECC in our servers already, I've recently been experimenting with hashing object contents in memory using a CityHash variant. The hash is checked when the object moves on chip (into cache), and re-computed before the object is stored back into RAM when it's been updated.

Although our production code is written in C, I'm not particularly worried about detecting wild writes, because we use pointer checking algorithms to detect/prevent them in the compiler. (Of course, that could be buggy too...)

What I'm trying to catch are wild writes from other devices that have access to RAM. Anyway, this is far from production code so far, but hashing has already been very successful at keeping data structures on disk consistent (a la ZFS, git), so applying the same approach to memory seems like the next step.

The speed hit is surprisingly low, 10-20%, and when you put it that way, it's like running your software on a 6 month old computer. So much of the safety stuff we refuse to do "for performance" would be like running on top-of-the-line hardware three years ago, but safely. That seems like a worthwhile trade to me...

P.s. Are people really not burning in their server hardware with memtest86? We run it for 7 days on all new hardware, and I figured that was pretty standard...

1 comments

1) Yes, lots of people don't run memtest86 at all.

2) Even those that do run it typically run it for no more than 24 hours

3) Many people don't build their own hardware these days, its a VPS or EC2

4) If you've selected ECC RAM then you know way more about memory failures than >99% of Redis users