| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BeeOnRope 2563 days ago

I was also confused by this, but my reading is this is entirely about branch prediction nothing about caching. In that context L1 and L2 simply refer to "first" and "second" level branch prediction strategies, and are not related to the L1 and L2 cache (in the same way that L1 and L2 BTB and L1 and L2 TLB are not related to L1 and L2 cache).

The way this works is there a fast predictor (L1) that can make a prediction every cycle, or at worst every two cycles, which initially steers the front end. At the same time, the slow (L2) predictor is also working on a prediction, but it takes longer: either throughput limit (e.g., one prediction every 4 cycles) or with a long latency (e.g., takes 4 cycles from the last update to make a new one). If the slow predictor ends up disagreeing with the fast one, the front end if "re-steered", i.e., repointed to the new path predicted by the slow predictor.

This happens only in a few cycles so it is much better than a branch misprediction: the new instructions haven't started executing yet, so it is possible the bubble is entirely hidden, especially if IPC isn't close to the max (as it usually is not).

Just a guess though - performance counter events indicate that Intel may use a similar fast/slow mechanism.