| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by raisin_churn 1260 days ago

> so code would have to find a way to switch modes it runs in as it is shuffled between core

It's even worse than that. Initially you could disable E-cores in BIOS to get the system to report AVX-512 being available, but Intel released a microcode update to remove this workaround[0]. Intel also stated that they started fusing off the AVX-512 in silicon on later production Alder Lake chips[1]. Also compare the Ark entries for the Rocket Lake[2], Alder Lake[3], and Raptor Lake[4] flagships. Only the 11900k lists AVX-512 as an available Instruction Set Extension. So it's reasonable to say that AVX-512 on consumer Intel lines is dead for now, whereas AMD has just introduced it in the Ryzen 7000 series.

[0] https://www.tomshardware.com/news/intel-reportedly-kills-avx...

[1] https://www.intel.com/content/www/us/en/support/articles/000...

[2] https://ark.intel.com/content/www/us/en/ark/products/212325/...

[3] https://ark.intel.com/content/www/us/en/ark/products/134599/...

[4] https://ark.intel.com/content/www/us/en/ark/products/230496/...

1 comments

herpderperator 1260 days ago

Does anyone know why they would do this? If AVX-512 works fine on P-cores, and if certain people disable E-cores because they want to use AVX-512, why would they stop those who want to from being able to use it? Why would they go to such extreme lengths to disable something?

link

formerly_proven 1260 days ago

"The glibc problem"

You can schedule among heterogeneous cores, that's not really a problem. You simply have another bit for "task used AVX512" and let the task run without AVX512 so it faults the first time it tries to use it. The same stuff is done (or used to be done) for AVX, because if you know a task doesn't use AVX, you don't need to preserve all those registers.

The issue is that eventually someone will find that memcpy* is 4.79 % faster on average with AVX-512 and will put that into glibc and approximately five minutes later all processes end up hitting AVX-512 instructions and zero processes can be scheduled on the E cores, making them completely pointless.

* It doesn't have to be memcpy or glibc, it's sufficient if some reasonably commonly used library ends up adopting AVX-512 when available.

link

Aardwolf 1260 days ago

> and zero processes can be scheduled on the E cores, making them completely pointless.

So because AVX-512 is fast, but E cores are slow, we should keep everything slow and prevent adoption of fast AVX-512 to prevent those E cores becoming pointless?

link

thfuran 1259 days ago

Well, Intel is in the business of selling e cores.

link

paulmd 1260 days ago

Nobody really knows for sure.

The immediate problem is that CPUID is not deterministic for naive software, if you don't set affinity-masks you don't know whether you will be scheduled onto p-cores or e-cores, and so the result you get will vary.

More generally, software doesn't know what configurations of threads to launch... you want to launch as many AVX-512 threads as you have logical cores, but not more, because they won't run on e-cores.

Software could potentially run a cpuid instruction affine to each logical core though, and collate the results... all you need know is "16 logical cores with AVX-512 and 4 without".

And software that isn't AVX-512 aware doesn't need to worry about it at all, since it doesn't know AVX-512 instructions. I guess the long tail of support is the stuff written for Skylake-SP in the meantime, but how much adoption really is there? It's that narrow gap between "regular stuff that never adopted AVX-512 because it wasn't on consumer platform" and "stuff that isn't HPC enough to be really custom" but also "stuff that won't receive an update". How much software can that really describe, especially with the reaction against Skylake-SP's clockdowns in mixed AVX-512+non-AVX workloads?

And also, that software can just launch AVX-512 threads and if they end up on the e-cores you trap the instruction and affine them to the p-cores. Linux already has support for this because Linux doesn't save AVX registers if there have never been AVX instructions used, so, it just would become another type of interrupt for that first AVX-512 instruction. Linus has commented that this is perfectly feasible and he's puzzled why they're not doing it too.

Nobody knows what the fuck is going on and there has been no plan expressed to anyone outside the company as to what the exact problem is and whether they're looking at anything to fix it going forward. It's a complete mystery, nobody even knows if it's something critical or everything is just too on-fire to care about that right now.

(and if it wasn't on fire before, it probably is now, nobody you want to retain is hanging around after a 20% pay cut off the top and truly insulting retention bonuses... ranging as high as $200 for a senior principal (no, that is not missing a "K"). Oh and we paid $4b in dividends, and you need to move to Ohio if you want to keep your job, yes the ohio with the cancer cloud. Intel is fucked.)

link

eyegor 1260 days ago

Perhaps market segmentation, perhaps they heard of a vulnerability in their implementation that they couldn't patch (hence the microcode update). Intel loves market segmentation (server specific avx extensions, bfloat, ecc, overclocking), and I wouldn't be shocked to see them sell avx512 support as a "dlc" microcode update down the road.

link