| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gslin 1516 days ago
	A problem is slowing down the CPU frequency significantly when AVX-512 is involved, e.g. https://en.wikichip.org/wiki/intel/xeon_gold/6262v this, which usually cancels out the benefit in the Real World (tm).

6 comments

PragmaticPulp 1516 days ago

This was massively exaggerated by journalists when AVX-512 was first announced.

It is true that randomly applied AVX-512 instructions can cause a slight clock speed reduction, the proper way to use libraries like this would be within specific hot code loops where the mild clock speed reduction is more than offset by the huge parallelism increase.

This doesn’t make sense if you’re a consumer doing something multitasking and a background process is invoking the AVX-512 penalty in the background, but it usually would make sense in a server scenario.

link

adgjlsfhk1 1516 days ago

the thing I never understood about this is why Intel didn't just add latency to the avx512 instructions instead? that seems much easier than downclocking the whole cpu

link

janwas 1516 days ago

I believe they do actually do something like this - until power and voltage delivery change, wide instructions are throttled independently of frequency changes (which on SKX involved a short halt).

link

pclmulqdq 1516 days ago

Intel has been trying to reduce the penalty for AVX-512, and barring that, advertise that there is no penalty. Most things on Ice Lake run fine with 256 bit vectors, but Skylake and earlier really needed 128 bit or narrower if you weren't doing serious vector math.

Forget about 512 bit vectors or FMAs.

link

alksjdalkj 1516 days ago

I think this is less of a problem on newer CPUs: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-fre...

link

pclmulqdq 1516 days ago

Those are client CPUs, which have very different behavior around power management than server parts. However, AVX downclocking has mostly gone away with ice lake and hopefully sapphire rapids does away with it permanently (except on 512 bit vectors).

link

mhh__ 1516 days ago

Unless someone has data for the latest Intel chips (i.e. sapphire rapids) showing the opposite I'm inclined to think this is a meme from 2016/7 that needs to go the way of the dodo.

link

Twirrim 1516 days ago

It was largely wrong then, too. Cloudflare, who really kicked off a large amount of the fuss, had "Bronze" class Xeon chips, that weren't designed or marketed for what they were attempting to use them for. They were only ever intended for small business stuff. Not large scale high performance operations. Their performance downclock for AVX-512 is way, way higher on Bronze.

link

NavinF 1516 days ago

Weren’t those chips $10k each back then? Hardly anyone got gold Xeons.

link

Twirrim 1516 days ago

Not even close. The blog post was 2017.

Actually, I stand corrected, after double checking, Cloudflare were using Silver. Entry level data centre chips, instead of small business chips. Still not the kind of chips you'd buy for high performance infrastructure, and not intended to be used for such.

Xeon Silver 4116s hit the market at $1,002.00. The Golds were $1,221.00. The performance differences are quite significant. For something that'll be in service for ~3-5 years, $200 is absolutely trivial by way of a per-chip increase. It's firmly in the "false economy" territory to be skimping on your chip costs. It's a bit more understandable in smaller businesses, but you just don't do it when you're operating at scale.

Also remember: at the scales that Cloudflare are purchasing at, they won't be paying RRP. They'll be getting tidy discounts.

link

NavinF 1515 days ago

I’m not familiar with the model numbers. What’s the gold equivalent to the Xeon Silver 4116?

Anyway I’m sure they compared the TCO of buying more low-end chips vs fewer high-end chips.

link

janwas 1516 days ago

I would love to see an example of reasonable code not seeing any benefit. On first generation SKX, we saw 1.5x speedups vs AVX2, and that was IIRC even without taking much advantage of AVX3-only instructions.

link

SemanticStrengh 1516 days ago

Please stop spreading this fallacy, while downclocking can happen, usually the benefit is still strong and superior to avx256. Even 256 can induce downclocking. AVX 512 when properly utilized simply demolish non AVX 512 cpus.

link

vlovich123 1516 days ago

On that one task. The challenge is if the avx512 pieces aren’t a bottleneck in every single concurrent workload you run. It’s fine if the most important thing your running on them is code optimized for AVX512. Realistically though, is that the case for the target market of CPUs capable of AVX512, since consumer use cases aren’t? The predominant workload would be cloud right? Where you’re running heterogeneous workloads right? You’d have to get real smart by coalescing AVX512 and non AVX512 workloads onto separate machines and disabling it on the machines that don’t need it. Very complicated work to do because you’d have to have each workload annotated by hand (memcpy is optimized to use AVX512 when available so the presence of AVX512 in the code is insufficient)

The more generous interpretation is that Intel fixed that issue a while back although the CPUs with that problem are still in rotation and you have to think about that when compiling your code.

link