Hacker News new | ask | show | jobs
by zigzag312 705 days ago
I've picked AMD over Intel too, but I've had so many issues with it that I partly regret it. Memory stability issues, extremely long boot times, too high voltage, iGPU driver timeouts. Most of the issues have been fixed, but not all. After months of dealing with an annoying memory leak, I've just recently been able to confirm that it is caused by a Zen 4 iGPU driver.
3 comments

I would never buy an AMD machine again after my last Ryzen 3600X. So many issues. It had to be power cycled 2-3 times to get it to boot. Memory corruption issues and stability issues galore. Not overclocked. Stock configuration. Decent quality board and power supply. Just hell.

Swapped board out assuming it was that. Same problem. Turned out to be the CPU which was a pain in the ass getting a warranty replacement for.

Ended up buying a new open box Intel 12400 Lenovo lump off eBay and using that.

I had similar issues with Zen of a few different generations, and with various boards. As a result, I built a new machine around an Intel 12400 as well. I did have to buy a thermaltake socket reinforcement bracket to mitigate the bending issue.

Oddly, this Intel build somewhat restored my faith in humans to build hardware and software as the thing seems to work quite well.

An issue with these parts was that the OOB config wasn’t very good - even if you knew to turn on the XMP profiles it still threw a ridiculous amount of voltage at the chip in pursuit of a few percent performance increase.
> Decent quality board

Which board was it?

Tried an MSI B550 initially. Think the second one was an Asus B550. The CPU swap did work ok the original board!

But at that point I was using the Lenovo box. So I just sold all the crap on eBay for the next victim.

Interesting. MSI doesn't really have a fantastic reputation for boards, and apparently ASUS's quality isn't that good any more either. :(

For my Ryzen 5000 series build (a while ago now) I went with an ASRock board for ECC support, and also ECC ram.

It's been mostly flawless, though as I'm undervolting the ram it does let me know about an ECC corrected error once every 6-9 months or so. ;)

I don't think there's a lot in it to be honest between vendors. They are all cheap garbage with lurid ass chunks of metal and artwork designed by a 5 year old stuck all over them.

And there's one thing you can NEVER trust and that is objectivity from gamers when looking at failure and reliability statistics. It's one huge cargo cult.

Notably my kids both have Ryzen 5600G + MSI B550 boards with no problems.

I have been using Gigabyte for a very long time and had no problems. ASUS was OK for me too, but MSI boards were the worst due to stability, driver and cooling curve problems. Don’t buy MSI.
The B550 series is a power reduced cost cutting version of the x570 boards. They are only meant for the 6 core version of chips, and the 65W versions. You need to pick your components carefully.
VRM is the component that you need to be looking at regarding the power delivery for the CPU. There are many motherboards that combine a lower-tier chipset and a high-end VRM.
B550 was that limited initially. Even the Ryzen 9 5950X runs on B550 series motherboards today. B550 is a bit scaled down, e.g. no PCIe 4.x lines, just 3.x, but that's OK with me.

My motherboard is an ASUS ROG Strix variety with 4x32GB ECC RAM and the Ryzen 9 5950X works just fine.

The chipset doesn't deliver power. So this is wrong. It has less PCI lanes and that's about it. I don't need them so I didn't buy them :)
I built an Intel workstation for the first time in two decades when the 13700K was released. It hasn't been a bed of roses, starting with thermal throttling from the LGA1700 socket bending the IHS so badly that the heatsink only contacted it in a strip down the middle, needing to physically reseat the onboard HDMI for the display signal to resume after the monitor is disconnected, a generally boiling TDP, DDR5 quirks like 5-minute training times (no blame here, just didn't expect my servers to boot faster), and generally having goofier names for UEFI options designed around overclocking. I still don't know how to use XTU.

Couple that with the underwhelming software support for AI/ML on their own hardware for about a year after CPU and GPU launch, and I wish I'd just stuck to AMD.

I don't think either are perfect, but it's the devil you know, and I've grown to trust that even when AMD cocks something up, they'll listen to customers, coordinate engineering efforts with OEMs, and handle it. Intel are either too high and mighty or don't empower their engineers to treat partners like partners without layers of management getting involved to be able to do something similar.

> Couple that with the underwhelming software support for AI/ML on their own hardware for about a year after CPU and GPU launch, and I wish I'd just stuck to AMD.

What support did AMD have?

Choosing Intel brought no advantage over AMD. What support did AMD need to overcome that?
Seems like a strange way to express that point? Why mention underwhelming support for AI/ML if it’s the same on both? (if we’re talking about desktop chips I don’t even understand what’s that supposed to mean).
Sounds like bad ram (clean contacts, re-seat, and test) or temperature issues (the main reason we still use mobile i7-12700H was cheap ddr4 64GB ram stick kit, Iris media gpu drivers, and rtx CUDA gpu.)

Intel has its own issues, Gigabyte told me to pound sand when asking to unlock the bios on my own equipment to disable IME.

There is no greener grass on the fence line... just a different set of issues =3

>Sounds like bad ram (clean contacts, re-seat, and test)

Since he's taking about iGPU issues, he most likely has a laptop APU, so no RAM to reseat. I'm also having similar issues on my Ryzen 7000 laptop. Kinda regret upgrading from the Ryzen 5000 laptop which AMD obsoleted just 2 years after I bought it, as at least that had no issues. Hopefully new drivers in the future will fix stability but you never know.

What I do know, is that this will most likely be my last AMD machine if Intel shows improvement to match AMD, since their Linux driver support is just top notch.

Desktop Ryzen 7950X.

Increasing the VRAM size (UMA size) to 4 GB fixed the frequent driver timeouts for me.

Reverting to older driver (driver cleaner -> driver v23.11.1) fixed the memory leak. This memory leak is weird since PoolMon doesn't show anything unusual. Nothing shows as using too much memory anywhere, except committed memory size grows to over 100GB after few days of uptime and RamMap shows a large amount of unused-active memory.

GPUs have the most complex drivers in the whole system, we're talking tens of millions LOCs, so it is absolutely not surprising that you're having issues like that given how recent AMD's investment into APUs is. I wouldn't use them for a few more years; get a cheap discrete GPU from nvidia or maybe even from Intel.
Hm? AMD's investing in APUs is not a new thing, that's going back to the FX days with their FM1 socket. Since Ryzen 1 they have their G APUs, and their integrated graphics power the steamdeck and many other mobile handhelds. Plus, Intel's integrated graphics are known for their driver issues (and so is Arc, for now), so I'd disagree with that recommendation.
APU is not only not a new thing, it’s a marketing term AMD themselves invented over 10 years ago pushing the entire concept of having an iGPU.
The rtx3090 is an Ampere gpu, and will apparently be supported in the new open nVidia driver release.

Should get interesting soon =)

In Nova? Or just the in-kernel component?
I have a similar CPU, and I also get frequent iGPU crashes, but only when opening multiple tabs (6+) with video.

I also increased UMA to 4 GB, it reduced the crash frequency, but it still happens.

The discrete NVIDIA GPU I use at the same time is fine.

Please post the cpu-z (win) or cpu-x (linux) chip make/model for other users to compare/search.

If there is enough data here, we may be able to see a common key detail emerge. i.e. if the anecdotal problem(s) remain overtly random, than a solution from the community or OEM may prove impossible.

Thanks in advance, =3

I initially got somewhat frequent hangs on Fedora with a Radeon 680M iGPU (in a Ryzen 7 PRO 6850U APU). The hangs stopped when I added amdgpu.dcdebugmask=0x10 to kernel boot options, based on some comments in an AMD Linux driver bug report [1]. That seems to disable panel self-refresh so it would seem to be related to that somehow.

Stability has been fine since. The bug report has since been closed but I haven't tested in a while to see if disabling PSR is still needed or if the issue has actually been fixed.

I haven't seen significant stability issues on Windows, although I don't use it much on the AMD device.

[1] https://gitlab.freedesktop.org/drm/amd/-/issues/2443

Please pull the chip maker/model and ram details off your rig:

sudo apt-get install cpu-x

sudo cpu-x

I think comparing your specifications may help other users narrow down if a manufacturing or software defect is present.

Thanks in advance =3

Depends on the failure mode, as it is common for specs to drift around under load (also, temperature cycling stresses PCB, and can shear BGA connections.)

I'd try a slower cheap set of lower-bandwidth/higher-latency ram sticks to see if it stops glitching up. If you are using low latency sticks (iGPU means this is usually recommended), than dropping the performance a bit may stabilize your specific equipment.

Of course, I'm not that smart... so YMMV... =3

There are no sticks in my laptop. I was taking about soldered RAM as is he norm on recent high speed LPDDR5X laptops.
Please pull the chip maker/model off your rig:

sudo apt-get install cpu-x

sudo cpu-x

We may still be able to use this information to compare with other users glitches to see if there is some underlying similarity.

Unfortunately, if it is a thermal stress/warping on the PCB cracking open RAM BGA balls on chips or shifting traces... One won't really be able to completely identify the intermittent issue.

We were actually looking at buying a similar economy model earlier this year (ended up with a few classic Lenovo models instead)... so please be verbose with the make/model to help future searchers =3

Can't be thermal, I checked.
I did ~12h RAM test few times and it always passed successfully (except when I was testing EXPO profile on early BIOS version).

I also did Prime95 CPU stress testing a few times without issues.

All issues seem to be related to either BIOS or drivers.

Pleas join the branch discussing the idea of using slower/cheaper RAM.

What is your current ram chip model, maker, and configuration on your machine?

sudo apt-get install cpu-x

sudo cpu-x

Cheers, =3

Corsair Vengeance 64GB (2x32GB) 5600MHz C36. Module Part Number: CMH64GX5M2B5600C36. DRAM manufactured by Samsung.

Running RAM at default speeds (4800MHz) or using XMP profile 5600MHz C36 doesn't affect these issues (they are no more or less frequent).

EDIT: XMP profile, not EXPO.

Thanks for helping the other users =3
Some more info if it helps anyone:

CPU Ryzen 9 7950X. Family: F (ext.: 19), Model: 1 (ext.: 61), Stepping: 2, Revision: RPL-B2.

iGPU: Raphael, revision: C1.

MB: ASUS TUF Gaming X670E-PLUS WiFi. Rev 1.xx. Southbridge rev.: 51.