| > When hinting branches gives you a bigger L1 cache footprint, it has a high cost. It was the same size on PPC, and on x86 using recommended branch directions (but not prefixes). > Compilers nowadays use code motion to implement branch hinting, which does not burn L1 cache. (Maybe code motion is what you mean by "hot/cold splitting"?) Hot/cold splitting is not just sinking unlikely basic blocks, it's when you move them to the end of the program entirely. That doesn't hint branches anymore, though; Intel hasn't recommended any particular branch layout since 2006. > How does using exceptions defeat return address prediction? You are explicitly not returning, so any prediction would be wrong anyway. Anything that never returns is a mispredict there; most things return. What it does instead (read the DWARF tables, find the catch block, indirect jump) is harder to predict too since it has a lot of dependent memory reads. |
Machines do still charge an extra cycle for branches taken vs. not, so it matters whether you expect to take it.
Negligibly few things never return; most of those abort. Performance of those absolutely does not matter.
Why should anyone care about predicting the catch block a throw will land in after all the right destructor calls have finished? We have already established that throwing costs multiple L3 cache misses, if not actual page faults.