Hacker News new | ask | show | jobs
by Veliladon 1467 days ago
One thing that Intel and AMD do better than any other player in the industry is branch prediction. An absolutely stupifying amount of die area is dedicated to it on x86. Combining this with massive speculative execution resources and you can get decent ILP even out of code that's ridiculously hostile to ILP.

Our modern CPU cores have hundreds of instructions in flight at any one moment because of the depth of OoO execution they go to. You can only go that deep on OoO if you have the branch prediction accurate enough not to choke it.

2 comments

> An absolutely stupifying amount of die area is dedicated to it on x86.

Yep. For example, on this die shot of a Skylake-X core,[0] you can see the branch predictor is about the same area as a single vector execution port (about 8% of the non-cache area).

[0]: https://twitter.com/GPUsAreMagic/status/1256866465577394181

> One thing that Intel and AMD do better than any other player in the industry is branch prediction. An absolutely stupifying amount of die area is dedicated to it on x86.

Zen in particular combines an L1 perceptron and L2 TAGE[0] predictor[1]. TAGE in particular requires an immense amount of silicon, but it has something like 99.7% prediction accuracy, which is... crazy. The perceptron predictor is almost as good: 99.5%.

I wrote a software TAGE predictor, but too bad it didn't perform as well as predicted (heh) by the authors of the paper.

[0]: https://doi.org/10.1145/2155620.2155635 [1]: https://fuse.wikichip.org/news/2458/a-look-at-the-amd-zen-2-...