|
|
|
|
|
by Twirrim
2114 days ago
|
|
> I would have thought that some instructions could be slower than others (especially on x86) so that using more faster individual instructions could be faster than 1 slower instruction.. There are some fun cases where that is definitely true, to whit pdep / pexp on Zen based architectures. https://dolphin-emu.org/blog/2020/02/07/dolphin-progress-rep... https://twitter.com/uops_info/status/1202950247900684290 > I just ran some tests: the performance seems to depend heavily on the value in the last operand; this is also the case for the register variants. If the last operand is set to -1 (i.e., all bits are 1), the instr. has 518 uops and needs more than 289 cycles! |
|