|
|
|
|
|
by CoolGuySteve
2744 days ago
|
|
The cpu still needs to load code in via instruction cacheline fetches. For every instruction fetch, that core isn't doing much. The compiler alleviates this somewhat by putting the hot path right under the branch instruction so that the fetch that grabs the branch also grabs the start of the hot path as part of the same cacheline. It sounds minimal, but if that fetch is swapped out of L2 cache due to long periods of inactivity, it can take upwards of 100ns, which starts to add up in HFT. |
|