Hacker News new | ask | show | jobs
by nullindividual 703 days ago
> It seems AMD had an issue supporting ECC with the current chipsets.

AMD has the advantage with regards to ECC. Intel doesn't support ECC at all on consumer chips, you need to go Xeon. AMD supports it on all chips, but it is up to the motherboard vendor to (correctly) implement. You can get consumer-class AM4/5 boards that have ECC support.

6 comments

> AMD supports [ECC RAM] on all chips

There was a strange happening with AMD laptop CPUs (“APUs”): the non-soldered DDR5 variants of the 7x40’s were advertised to support ECC RAM on AMD’s website up until a couple months before any actual laptops were sold, then that was silently changed and ECC is only on the PRO models now. I still don’t know if this is a straightforward manufacturing or chipset issue of some kind or a sign of market segmentation to come.

(I’m quite salty I couldn’t get my Framework 13 with ECC RAM because of this.)

> AMD supports it on all chips

Unfortunately not. I can't say for current gen, but the 5000 series APUs like the 5600G do not support ECC. I know, I tried...

But yes, most Ryzen CPUs do have ECC functionality, and have had it since the 1000 series, even if not officially supported. Official support for ECC is only on Ryzen PRO parts.

You need W680 boards (starting at around 500 bucks) for ECC on desktop intel chips.
I was seeing them around $400 (still expensive).
Actually some of the 13th and 14th gen Intel Core processors support ECC.
Intel has always had randomly supported ECC on desktop CPUs. Sometimes it was just a few low end SKUs, sometimes higher end SKUs. 14th gen it appears i9s and i7s do, didn't check i5s, but i3s did not.
My understanding is that it's screwed up for multiple vendors and chipsets. The boards might say they support it, but there are some updates saying it's not. It seemed extremely hard to find any that actually supported it. It was actually easier to find new Intel boards supporting ECC.
yeah wendell put out a video a few weeks ago exploring a bunch of problems with asrock rack-branded server-market B650 motherboards and basically the ECC situation was exactly what everyone warns about: the various BIOS versions wandered between "works, but doesn't forward the errors", "doesn't work, and doesn't forward the errors", and (excitingly) "doesn't work and doesn't even post". We are a year and a half after zen4 launched and there barely are any server-branded boards to begin with, and even those boards don't work right.

https://youtu.be/RdYToqy05pI?t=503

I don't know how many times it has to be said but "doesn't explicitly disable" is not the same thing as "support". There are lots of other enablement steps that are required to get ECC to work properly, and they really need to be explicitly tested with each release (which if it is "not explicitly disabled", it's not getting tested). Support means you can complain to someone when it doesn't work right.

AMD churns AGESA really, really hard and it breaks all the time. Partners have to try and chase the upstream and sometimes it works and sometimes it doesn't. Elmor (Asus's Bios Guy) talked about this on Overclock.net back around 2017-2018 when AMD was launching X399 and talked about some of the troubles there and with AM4.

That said, the current situation has seemingly lit a fire under the board partners, with Intel out of commission and all these customers desperate for an alternative to their W680/raptor lake systems (which do support ecc officially, btw) in these performance-sensitive niches or power-limited datacenter layouts, they are finally cleaning up the mess like, within the last 3 weeks or so. They've very quickly gone from not caring about these boards to seeing a big market opportunity.

https://www.youtube.com/watch?v=n1tXJ8HZcj4

can't believe how many times I've explained in the last month that yes, people do actually run 13700Ks in the datacenter... with ECC... and actually it's probably some pretty big names in fact. A previous video dropped the tidbit that one of the major affected customers is Citadel Capital - and yeah, those are the guys who used to get special EVEREST and BLACK OPS skus from intel for the same thing. Client platform is better at that, the very best sapphire rapids or epyc -F or -X3D sku is going to be like 75% of the performance at best. It's also the fastest thing available for serving NVMe flash storage (and Intel specifically targeted this, the Xeon E-2400 series with the C266 chipset can talk NVMe SAS natively on its chipset with up to 4 slimsas ports...)

it's somewhere in this one I think: https://www.youtube.com/watch?v=5KHCLBqRrnY

The new EPYC processors for AM5 though look like they'll be ok for ECC ram though, at least in the coming months onwards.
Yeah I think that’s the bright spot, now that there’s a branded offering for server-flavored Ryzen now maybe there is a permanent justification for doing proper validation.

I just feel vindicated lol, it always comes up that “well works fine for me!” and the reality is it’s a total crapshoot with even server-branded boards often not working. There is zero chance your gigabyte UD3 or whatever is going to be consistently supported across bios and often it will not be.

And AMD is really really tied to AGESA releases, so it’s fairly important on that side. Although I guess maybe we’re seeing now what happens if you let too much be abstracted away… but on the other hand partners were blowing up AMD chips last year too.

If you’re comfortable always testing, and always having the possibility of there being some big AGESA problem and ecc being broken on the new versions… ok I guess.

There is a reason the i3 chips were perennial favorites for edge servers and NASs. And I think it's really, really hard to overstate the long-term damage from reputation loss here. Intel, meltdown aside, was always no-drama in terms of reliability. Other than C2000/C3000, I guess.

...and puma and i-225V chipsets.

or at least... maybe on the CPU side they were no-drama. Other than C2000/C3000. Granted the powervr graphics on the atoms way back did suck... and meltdown... and avx-512 being rolled back... /phillip j fry counting on his fingers

maybe "blue-chip coded" is a better way to express it ig

but like, there is a notable decline in the quality of execution of intel overall, pretty much across the board, and cpu was always their core vertical, right? That was their business redoubt. intel is blue chip chips, especially CPUs. And now it's falling - really it's been falling for a while. Meltdown I can generally excuse (yes, shush), nobody appreciated sidechannels back then even if they were theoretically known. C2000/C3000 is another fuckup. yeah it's the super-io/serial bus controller... technically not their IP but it happens to be in a critical path, on their node, killing their processor. They fucked up the validation there, evidently.

I-225V had three steppings and I-226V is still not fully fixed (windows/linux have just turned off the EEE/802.11az feature instead). Puma was a god damned mess.

Sapphire rapids was late, still a huge mess, and actually the -W platform had not only insane power draw, but also insaner transients. 750W average, spiking up to 1500W under load, with pretty steep holdup requirements. And actually that was locked behind a "water cooled" bios option, the processor just "refused to all-core turbo" otherwise. And Intel didn't wanna actually say that the "water cooled" behavior was the spec or intentional turbo limits etc. In hindsight hmmm, that all took a bit of a different tone, didn't it?

Supposedly there is going to be a SPR-W refresh with a new stepping to fix this... emerald rapids is also very power-hungry and there were some unconfirmed murmurs suggesting it might have the same crash problems.

(yes, yes, please just listen to the guest here.) https://www.youtube.com/watch?v=_HJu5xt43iQ&t=3603s

https://wccftech.com/intel-xeon-w-3500-w-2500-sapphire-rapid...

Intel's in some real danger especially with AMD ascendant like this. Like it doesn't take very long of this real damage to customers etc and that "we're blue-chip!" thing will cease to be, and that is the last prop keeping intel's finances above the water here. Sure, it will take a while to fully wind down but... this is a great example of how intel's fuckups are driving their clients literally into the arms of the competition. A month or two ago, Asrock Rack didn't give a shit about the B650-2L2T or whatever. Guess what? Now Epyc Mini exists and oems are going to be paying attention to that. Oops.

> I-226V is still not fully fixed

Damn, didn't realise that was still being problematic too. :(

And yeah, Intel's current stumble with 13th/14th gen cpus seems like worst possible timing for such an extreme fuck up. That's not going to go well for future planning/purchase decisions by business customers.

ECC support wasn't good initially on AM5, but there are now Epyc branded chips for the AM5 socket which officially support ECC DDR5. They come in the same flavors as the Ryzen 7xx0 chips, but are branded as Epyc.