Hacker News new | ask | show | jobs
by scott00 2744 days ago
I was with you until you mentioned branch prediction... isn't branch prediction a hardware feature? How do you trick the HW branch predictor into predicting the unlikely case?
2 comments

The cpu still needs to load code in via instruction cacheline fetches. For every instruction fetch, that core isn't doing much.

The compiler alleviates this somewhat by putting the hot path right under the branch instruction so that the fetch that grabs the branch also grabs the start of the hot path as part of the same cacheline.

It sounds minimal, but if that fetch is swapped out of L2 cache due to long periods of inactivity, it can take upwards of 100ns, which starts to add up in HFT.

Yes it is a hardware feature. However the hardware can be given hints as to which branch is more likely. This is generally documented by the manufacturer, in one of those technical documents aimed at compiler writers.

With profile guided optimization it is possible for the compiler to have much better information about branches than the CPU can guess. Java applies profile guided optimization in real time, with C++ it much more complex to apply.

> However the hardware can be given hints as to which branch is more likely.

I don't think that is the case for modern (last 8 years or so) Intel processors. For example, I'm under the impression that gcc's __builtin_expect only affects the layout of the generated code. However I'd love to learn something new here; do you have a source or any additional info you could share?

You are correct.

The hints are purely for the compiler. When branch probabilities are available (either via heuristics, annotations or profile data), it will optimize hot paths differently from cold paths. For example it might be more aggressive with inlinig or vice versa optimize for size. Also will attempt to put cold code in separate pages so that it doesn't get pollute the cache. Also non taken braches are marginally "faster" than taken so it is worth, when possible to put hot code in the non taken branch.

I'm not a compiler writer, I'm sure there is more.

The source for this on Intel processors is the Intel Optimization Reference Manual, section 3.4.1: https://www.intel.com/content/dam/www/public/us/en/documents...
To me, that section says that you can emit machine code that will allow the branch predictor to do a better job, not that you can control what the branch predictor does.
To my mind those amount to the same thing.
See my other comment: https://news.ycombinator.com/item?id=18735053. I really should have consolidated them somehow.
My understanding is the layout is the hint. Modern CPUs would not want to use cache space for any hints, not to mention all the silicon to decode the hint.

Compiler writers are smart people and can layout code to give the CPU the right hints, so long as they know what the CPU does. CPU manufactures want the compilers writers to do this as the compiler is a significant factor in making one CPU faster than the competition in benchmarks.

It seems to me that it's misleading to call that a hint. Rather, certain code paths create hard-to-predict branches, and others create predictable branches. A hint would be some kind of metadata that says "you'll want to predict this branch this way based on your algorithm, but please don't!"
s/code paths/code patterns/
I think you're discussing two different phases of optimization. PGO and Java's JIT use branch information to emit different machine code. Hardware branch prediction takes machine code and determines which branches in the machine code are taken, and speculates based on that information. There's an underlying pattern that both follow, but they're very different.
PGO and JIT both use their information to change the machine code to suggest which branch is more likely.

PGO and JIT also do a lot of other things that are unrelated to this discussion. Some of those things can have a much larger gain than branch prediction.