Hacker News new | ask | show | jobs
by slackerIII 1480 days ago
The paper goes into a lot more details: https://arxiv.org/pdf/2205.05982.pdf
1 comments

I don’t see any mention in the paper of thermal clock throttling concerns, which can really neuter performance of tools that sustain use of AVX operations over a period of time. For the quick benchmarks presented in the paper, of course it will be faster. What if I’m continuously hammering my CPU with AVX operations? I expect it to severely downclock.
On Ice Lake Xeon the penalty for using the AVX-512 features on a single core is -100MHz. If we pessimistically use the slowest part Intel sells, that is a 5% performance penalty (2% on their fastest parts). The speedup from this work is 40-60% compared to AVX2. So you'd be a fool to take the side of the folk myth. AVX-512 works.

By the way the performance penalty for using AVX-512 on multiple cores when the multiple cores were already active is zero. There is no penalty in most server scenarios.

>On Ice Lake Xeon the penalty for using the AVX-512 features on a single core is -100MHz.

That is a penalty due to licensing [0], not thermal throttling. As I wrote elsewhere, I’ve seen my clockspeed get cut in half across all cores on a physical die when running AVX-heavy operations for a sustained period of time, due to thermal throttling.

[0] https://travisdowns.github.io/blog/2020/08/19/icl-avx512-fre...

The default AVX offset for Ice Lake is indeed only 100MHz (and it doesn't exist starting with Rocket Lake), but 512b SIMD instructions use a lot of power, and as a result generate a lot of heat - so they certainly can cause thermal throttling or throttling due to power limits
It's the transition that kills you. Are you doing this full time?
My full-time thing is more search-y and takes tens of milliseconds so I'm not really sweating power state transitions that take a few micros.
That might be an interesting benchmark, but assuming good cooling isn't exactly unreasonable either.
In datacenter blade servers (i.e. on a cloud VM), I’ve noticed up to 50% downclocking due to thermal throttling when running sustained frequent AVX operations.

I’m sure an exotic watercooled setup will fare much better, but those aren’t generally what we run in production.

My laptop has been throttling itself for a while, I recently discovered. I had been trying to benchmark some code changes and have given up and am letting the CI machine run them, because my numbers are all over the place and go down with each run.
One option would be to go into BIOS and see if there's some way of just locking your CPU to one of the lower clock speeds. This will give lower benchmarking numbers of course, but at least they should be fairly stable. (in Linux, it also often possible to tinker with frequencies while the system is running).

Even on a desktop this sort of thing is sometimes necessary, for example my CPU has different clock speeds depending on how many processors are running, so I have to lock it to the all-core clock if I want to see proper parallel speedups.

This might be annoying for day-to-day usage (although, CPUs really are insanely performant nowadays so maybe it will not be too bad).