Hacker News new | ask | show | jobs
by valbu 1384 days ago
Not often do I find a comment that brings up very similar experience. Had intermittent data corruption because memory bit error on very upper range that almost always was unused and went unnotced a long time. ZFS backend PC was actually ok, it was main PC that used the share but point remains. After that no more non-ECC memory ever on any computer for me (ok except some laptops).
1 comments

I'm only going to add a related anecdote that wasn't a failing of ECC vs non-ECC but rather of BIOS behavior.

Background: Lenovo Thinkpad T520 laptop, random crashes and data corruption.

Diagnosis: Eventually let memtestx86+ run a bunch of times for like a week and it wasn't showing any errors. Finally about to give up I pressed some key on the keyboard and it blew errors immediately all over the screen. This suggested EC or maybe some BIOS-controlled keyboard driver was writing to memory it shouldn't have been.

Fix: I am a Linux user, the kernel has an option to reserve low memory for poorly behaving BIOS that likes to write where it shouldn't. CONFIG_X86_RESERVE_LOW should be set to at least 64kb and increased up to 640kb if this issues continue to happen. There are some other options to scan for this misbehavior but I honestly don't know how Linux currently handles it: https://lkml.org/lkml/2013/11/11/683

Linux actually switched recently to leaving the whole bottom 1MB to the BIOS: https://lore.kernel.org/lkml/YLx%2FiA8xeRzwhXJn@zn.tnic/T/#u

Apparently Windows now does it too because too many BIOSes are buggy: https://bugzilla.kernel.org/show_bug.cgi?id=16661#c2