Hacker News new | ask | show | jobs
by Twirrim 2114 days ago
> I would have thought that some instructions could be slower than others (especially on x86) so that using more faster individual instructions could be faster than 1 slower instruction..

There are some fun cases where that is definitely true, to whit pdep / pexp on Zen based architectures. https://dolphin-emu.org/blog/2020/02/07/dolphin-progress-rep...

https://twitter.com/uops_info/status/1202950247900684290

> I just ran some tests: the performance seems to depend heavily on the value in the last operand; this is also the case for the register variants. If the last operand is set to -1 (i.e., all bits are 1), the instr. has 518 uops and needs more than 289 cycles!

1 comments

From what I can tell, Zen based architectures have a slow compatibility-only emulation of those two instructions because the fast implementation is patented - and it's patented by a university who've got some kind of deal with Intel involving co-development of the instruction, rather than by Intel themselves, so AMD's patent cross-license doesn't cover it.