Hacker News new | ask | show | jobs
by titzer 1483 days ago
It's not so secret, TBH. Usually the intel microarchitecture manuals are detailed enough to describe how many and what type of execution ports there are, how many stages in the pipeline, the size of the reorder buffer, latency of most u-ops, and any frontend hazards. The super secret stuff are things like the design of the branch predictors, memory disambiguation, etc, as well as the low-level tricks to optimize each of these down to the fewest gate delays (for high clockspeeds, etc), as well as where and how they figure out placement, etc.
1 comments

The front end (the decoder stage and branch predictors) are what would theoretically be important for compilers as they’re the bottleneck. But Intel’s optimization advice doesn’t say much about branches anymore, they pretty much want you to rely on them to take care of it.

That’s only part secrecy and part to give them freedom to change it. It is of course somewhat described in their patents.

There are sometimes vague hints about things to avoid, e.g. putting too many branches on the same cache line, and they usually publish the size of their tables, typically 4K, 8K entries these days? But the actual predictors are wicked devils; they clearly are doing some tournament predictors, using tiny ML modules (perceptrons), and god knows what else. I studied this carefully when trying to make good Spectre gadgets, but it is very very difficult to 100% trick (or utilize!) a branch predictor these days--they just learn in interesting ways...and entries alias :-)

I honestly don't know if it's worth it to try to optimize branch prediction in compilers these days, beyond the obvious step of putting the highest probability target next (for fallthrough prediction) and generally laying out hot parts of the code together. TurboFan and most other dynamically-optimizing compilers put rare code at the end of functions, and that's a huge boost.