|
|
|
|
|
by dap
4950 days ago
|
|
Great post, showing admirable dedication to software reliability and a solid understanding of memory issues. One of the suggestions was that the kernel could do more. Solaris-based systems (illumos, SmartOS, OmniOS, etc.) do detect both correctable and uncorrectable memory issues. Errors may still cause a process to crash, but they also raise faults to notify system administrators what's happened. You don't have to guess whether you experienced a DIMM failure. After such errors, the OS then removes faulty pages from service. Of course, none of this has any performance impact until an error occurs, and then the impact is pretty minimal. There's a fuller explanation here:
https://blogs.oracle.com/relling/entry/analysis_of_memory_pa... |
|