| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kissiel 2806 days ago
	I wonder about the Joules per byte. AFAIK AVX units are quite expensive energy-wise.

2 comments

masklinn 2806 days ago

Don't they also tend to work at a lower clock due to their higher energy requirements?

edit: though this is AVX2 ("AVX-256") rather than AVX-512, and Lemire has covered AVX and the possibility of throttling (with or without AVX) in the past so they're probably aware of the potential issue and consider that they either won't get triggered or the gain is good enough to compensate the lower frequency.

link

kissiel 2806 days ago

Nice. So I understand that AVX2 is not bringing the CPU's clock down.

Got any sources for power consumption figures/comparisons of those AVX units?

link

lorenzhs 2806 days ago

Heavy use of complex AVX2 operations causes downclocking, too, but typically less so than AVX-512. More details are documented in https://en.wikichip.org/wiki/intel/frequency_behavior -- also see e.g. https://en.wikichip.org/wiki/intel/xeon_gold/6138#Frequencie... for an example how the frequencies differ depending on the number of active cores.

I think the reason for reducing clock speed when vector units are in heavy use is to keep power usage in check.

You might also find https://blog.cloudflare.com/on-the-dangers-of-intels-frequen... helpful, which goes into detail about a specific case where dynamic frequency scaling resulted in AVX-512 code running slower than AVX2 code.

link

Twirrim 2805 days ago

It's worth noting that the cloudflare test was done on a Xeon Silver, which has worse properties around the frequency changes than the Gold or Platinum. If you're on either Gold or Platinum, you're less likely to suffer the problems that Cloudflare did with mixed workloads.

This seems an optimisation nightmare. Your program needs to be aware both of the capability of the chip for using instructions, and what type of chip it is within a family to decide if you maybe do or don't want to use certain vectored instructions.

link

masklinn 2806 days ago

And here are some of Lemire's own posts on the subject:

* https://lemire.me/blog/2018/04/19/by-how-much-does-avx-512-s...

* https://lemire.me/blog/2018/08/13/the-dangers-of-avx-512-thr...

* https://lemire.me/blog/2018/08/15/the-dangers-of-avx-512-thr...

* https://lemire.me/blog/2018/08/24/trying-harder-to-make-avx-...

* https://lemire.me/blog/2018/08/25/avx-512-throttling-heavy-i...

* https://lemire.me/blog/2018/09/04/per-core-frequency-scaling...

* https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-us...

link

the8472 2806 days ago

The downclocking does not apply at all to simple 256bit bit juggling operations. The code in question should run at full speed.

link

kwillets 2805 days ago

This doesn't do anything harder than a saturating subtract.

link

twtw 2805 days ago

It could well be lower than a scalar approach. SIMD units like AVX are power hungry, but a greater fraction of that power is relevant computation rather than power for control, schedule, etc. Ideally, the constant instruction overhead to get it executing on a functional unit is amortized over the width of the vector.

link