Hacker News new | ask | show | jobs
by sirn 683 days ago
I feel like the entire fiasco has been multiple issues being lumped together as one, and muddied to the point that even a bluescreen out of an attempt to run XMP at extremely high MT/s are now being claimed as degradation. From what I can make out of this mud, there seems to be (1) a failure caused by high current due to some boards unlocking IccMax/PL1/PL2 by default, and (2) high voltage during a single-core boost (TVB). The former is caused by overclocking, and the latter seems to be Intel's failure to validate the CPUs at low load/long period of single-core boost, where IccMax/PL no longer matters as much (since single-core boost never exceeds PL1 anyway).

Most Raptor Lake "server boards" right now are W680 with client CPUs because the C266/Xeon E-2400 took a long time to come out. The one intended for workstations typically has overclockable settings or is even overclocked by default, which means it's likely to get hit with the failure (1). The one intended for servers do have more conservative settings, but can still be hit with failure (2) under some conditions.

Buildzoid released a video on the Supermicro W680 blade a bit ago that were having issues after running a single-core load 24x7, which is essentially 24x7 boost[1] (aka issue (2)). Xeon E-2400 _could_ be affected in this scenario, although even the highest clock E-2400 SKU (E-2488) is only running at 5.6GHz without Thermal Velocity Boost, and most others are ranging from 4.5 to 5.2 GHz boost (rather than the 5.8 to 6 GHz boost some client SKUs do). I feel like the actual B0 Xeon E-2400 would be a lot less prone to both failures (1) and (2) due to this (but it could happen, though there's no reports of such).

But then the conversation gets muddied enough that "even servers and Xeons are affected" becomes the common narrative (while the former is true, the circumstances needs to be noted; and for Xeons, it's a _maybe_ at most, since right now there's no report of Xeon E-2400 failing).

[1]: https://www.youtube.com/watch?v=yYfBxmBfq7k

1 comments

Looking around, I'm seeing reports of 1.4-1.5V core voltages using Intel's stock profiles, with some even going to 1.7V. That's insanely high for a 10nm process and I'm not surprised about the degradation. For comparison, in the 45/32nm days 1.2-1.3V was the norm, with some extreme overclockers (who don't expect CPUs to survive for more than a few minutes, using liquid nitrogen etc.) hitting ~1.5V, and 1.4V was a commonly quoted safe upper limit for 24x7 operation.
This is why I think it's going to be much harder for Xeon to be affected by this, as they're normally running in a more conservative voltage settings. I don't have Xeon E-2400 to look at, but Raptor Lake should be able to do 5.6 GHz at 1.3-1.4V-ish, which should be within a safe voltage range. (even the "power hungry" w9-3495X only runs at ~1.25V during 4.8 GHz TVB, and ~1.15V at non-TVB 4.6 GHz boost.)