Hacker News new | ask | show | jobs
by mibsl 1589 days ago
Just a guess, but it kind of sounds like machine code loop alignment could be the cause. Modern CPUs really like their jump targets 32 byte aligned.
1 comments

Aligned branch and jump targets are just for the sake of maximising instruction cache hit-rate. It will always remain a micro-optimisation and will not make a difference in this and most other cases.
To maximize I$ hit rate you'd actually want to disable loop alignment to not inflate code size with padding.

Here's an example of unstable benchmark performance caused by the lack of code alignment in .NET's JIT: https://github.com/dotnet/runtime/issues/43227