Hacker News new | ask | show | jobs
by gridlockd 2172 days ago
AVX is not a "special purpose block", it's Intel's answer to not adding special purpose blocks on customer demand, like you can do with ARM.

Crypto or video decoding comes to mind, those would be much faster with dedicated silicon, but more general AVX instructions can get you halfway there. Well, maybe a quarter. People point out that AVX uses a lot of power, but they ignore that the same algorithm running instead on more but simpler cores would use even more power.

1 comments

> but more general AVX instructions can get you halfway there

Maybe misunderstand you but there are some fairly non-general ops for encoding/decoding crypto

https://en.wikipedia.org/wiki/AVX-512#VAES

They exist today, but they were added after AVX. Every year we figure out how to cram more transistors on a cubic cm, and once the low hanging fruit was added and we knew how to add more transistors, we decided to start putting more and more specific functions.

That is the point of Linus. He would have preferred to use that increase in transistor count for other things, like more cache.

More cache has diminishing returns, because cache wants to be as close as possible to the core logic. And modern CPU's are mostly cache anyway. Special-purpose blocks for common compute tasks are quite cheap.
>And modern CPU's are mostly cache anyway.

Skylake is less than 30% cache. However internally it's 512bus, thanks to avx-512 - which could be considered suboptimal.

Unsupported by valgrind still. Not sure about qemu. Don't use.