|
|
|
|
|
by ckennelly
1664 days ago
|
|
As mentioned in that Stack Overflow post, though, things change again with FSRM (Fast Short Rep Mov). While there are still startup costs, the overhead of calling a function (especially via a PLT) and incurring instruction cache misses is hard to demonstrate in a microbenchmark, while rep movsb encodes more compactly than many flavors of call. In an actual application though, the "slower" but smaller implementation can often win (https://research.google/pubs/pub50338.pdf and https://research.google/pubs/pub48320.pdf) |
|