|
|
|
|
|
by sireat
3862 days ago
|
|
Exactly, you want to know when the error is due to memory. Intel deciding that consumers (including those buying Haswell-E CPUS) do not need ECC really irks me. Textbook market segmentation from a near monopoly. Currently you can not have your cake and eat it: You cannot have the best single-thread performance (offered by overclocking Haswell-E series or Skylake 6700k) and have ECC. So if one is building the ultimate workstation, you have a hard choice, do you go with X99 chipset(no ECC but can overclock) or do you go to the server motherboards with C610 chipsets which are quite limited as far as consumer interests are. Interesting are the Intel mobile Xeons which now provide a venue for ECC on a laptop. |
|
Generally if you are willing to give up a single clock bin in exchange for ECC you end up with a cheaper (and cooler) system that's more reliable. Generally if you want the cheapest 4c/8t CPU it's a xeon, NOT an i7.
I don't feel particularly artifically segmented. Additionally the high end desktop motherboards tend to be more expensive than the server boards. Often I find a nice server board at $180 and the nice desktop boards are often another $100. Sure they are marketed to gamers, but I really just want a nice reliable power and cooling and it's not clear which of the cheaper desktop boards are really going to last 24/7 for 5 years.
Today I'd buy the E3-1270 for $339 over the $350 i7-6700k. Keep in mind the k chips are a premium AND they don't come with a fan like the non-k chips do. Sure it's 3.6 - 4 GHz instead of 4.0 to 4.2 GHz, not a particularly noticeable difference, especially since that both thermally throttle as needed.
I think ECC is well justified because it doesn't just detect dimm errorrs, but also motherboard errors, cpu errors, and socket (dimm or cpu) errors. If a node randomly crashes/hangs it's very hard to track down why... unless you have ECC and often will help you pin it down. I'd much rather see something strange show up in mcelog than wait for a hang, or worse a corruption.
Most of my "ecc" errors have actually been motherboard, socket, or (in AMDs case) CPUs. When I look at larger samples some dimms are WAY less reliable than others. Strongly implying it's not high energy particles, but something out of spec.