Hacker News new | ask | show | jobs
by arp242 1118 days ago
It also has a fairly easy solution: disable the CC6 sleep state. The practical effects from that will most likely be minimal or non-existent for most users of these CPUs.
2 comments

> disable the CC6 sleep state.

This is now the second time AMD has screwed up the C6 state. Ryzen first gen would hang daily for me when due to a similar bug.

I don't understand the nature of the relationship between a motherboard manufacturer and AMD but when I got my MSI Tomahawk board for my Ryzen I really thought I was losing my mind. I would have USB devices stop working at the most random of times and some of them would continually cycle between connected and not connected.

A motherboard update from MSI applied something from AMD and that fixed the issue.

They’ve improved by three decimal orders of magnitude since then. How much more can we ask of them?
I can already see the pain of the myriads of compliance (to all energy reduction directives, at least in EU) people getting strangely obtuse notes from their sw/hw/platform teams, saying in essence, errrrr we need to amend our already thick justification folder, to disable a specific sleep state. I feel a migraine (or a kind of sketch) coming. 'oh and BTW we're field upgrading the whole fleet'.

I guess fighting tooth and nail to disable any and all of these sleep states from the get go is worth it...

Would this qualify as more CPU errata?
There's errata and errata...

As a systems seller you get most of the markup but also most of the responsibility, so handwaving 'sorry AMD fucked up' won't do it. You know have an installed base that might crash every 1024 days, which for unattended systems is long but not that long. Worse if you have hardware redundancy, there's still a chance they all booted around the same time so will crash around the same time.

Customers will be proactive and follow the intelligent periodic reboot schedule you propose for a time (see the 787 overflow bugs stories), while asking for a fix. The fix needs to still be OK with all the specs you sold. If one of these specs depends on sleep states, you'll have to find a solution around it and deploy it fleetwide. If a microcode update fixes it, yay. If the problem can't be winked away with a software patch, now the blast radius is bigger and you're still supposed to do as much as possible to use the least energy possible in most idle states...