|
|
|
|
|
by Sirened
1353 days ago
|
|
I am terrified to think what AMD's predictor structure is if it's easier for them to do _this_ than it is to simply add privilege tags to their predictors. I don't personally buy this explanation anyways; trying to optimize retpolines in hardware would be an absolute pain in the ass and require an insane amount of synchronization with the backend since retpolines always trash the RAS. |
|
The branch predictor is one of the most highly optimized pieces of the CPU core. Lots of discussion has been had about how the arm architecture's frontend is simpler, so for example Apple's chips have way more execution units. Intel and AMD's latest designs have also expanded the number of execution units, but the frontend instruction decode and dispatch is the "serial" part of the process, reading the incoming instruction stream. And the x86 instruction set is hard to decode, with a lot of variation in the number of bytes per instruction. So for the instruction decoder to even know there's a branch coming up is a "hard problem," and then it predicts which way the branch will go.