Hacker News new | ask | show | jobs
by ckennelly 1664 days ago
As mentioned in that Stack Overflow post, though, things change again with FSRM (Fast Short Rep Mov).

While there are still startup costs, the overhead of calling a function (especially via a PLT) and incurring instruction cache misses is hard to demonstrate in a microbenchmark, while rep movsb encodes more compactly than many flavors of call. In an actual application though, the "slower" but smaller implementation can often win (https://research.google/pubs/pub50338.pdf and https://research.google/pubs/pub48320.pdf)