|
|
|
|
|
by gpderetta
2935 days ago
|
|
>You'll notice that I get the return address into RAX as soon as possible. Believe it or not, this makes a real difference in performance. It allows the CPU to start fetching instructions after the JMP/RET even sooner than if you have it at the bottom of the function as you do. The "execution" of the jump happens well before the pop rax is executed. The real difference is using a jump insted of a ret. The latter will be always mispredicted as the CPU return address predictor (the stack engine) will always get it wrong, while the indirect predictor used by jmp has a chance to get it right. |
|