Hacker News new | ask | show | jobs
by lorenzhs 2806 days ago
Heavy use of complex AVX2 operations causes downclocking, too, but typically less so than AVX-512. More details are documented in https://en.wikichip.org/wiki/intel/frequency_behavior -- also see e.g. https://en.wikichip.org/wiki/intel/xeon_gold/6138#Frequencie... for an example how the frequencies differ depending on the number of active cores.

I think the reason for reducing clock speed when vector units are in heavy use is to keep power usage in check.

You might also find https://blog.cloudflare.com/on-the-dangers-of-intels-frequen... helpful, which goes into detail about a specific case where dynamic frequency scaling resulted in AVX-512 code running slower than AVX2 code.

4 comments

It's worth noting that the cloudflare test was done on a Xeon Silver, which has worse properties around the frequency changes than the Gold or Platinum. If you're on either Gold or Platinum, you're less likely to suffer the problems that Cloudflare did with mixed workloads.

This seems an optimisation nightmare. Your program needs to be aware both of the capability of the chip for using instructions, and what type of chip it is within a family to decide if you maybe do or don't want to use certain vectored instructions.

The downclocking does not apply at all to simple 256bit bit juggling operations. The code in question should run at full speed.
This doesn't do anything harder than a saturating subtract.