| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scott00 2744 days ago
	I was with you until you mentioned branch prediction... isn't branch prediction a hardware feature? How do you trick the HW branch predictor into predicting the unlikely case?

2 comments

CoolGuySteve 2744 days ago

The cpu still needs to load code in via instruction cacheline fetches. For every instruction fetch, that core isn't doing much.

The compiler alleviates this somewhat by putting the hot path right under the branch instruction so that the fetch that grabs the branch also grabs the start of the hot path as part of the same cacheline.

It sounds minimal, but if that fetch is swapped out of L2 cache due to long periods of inactivity, it can take upwards of 100ns, which starts to add up in HFT.

link

bluGill 2744 days ago

Yes it is a hardware feature. However the hardware can be given hints as to which branch is more likely. This is generally documented by the manufacturer, in one of those technical documents aimed at compiler writers.

With profile guided optimization it is possible for the compiler to have much better information about branches than the CPU can guess. Java applies profile guided optimization in real time, with C++ it much more complex to apply.

link

usefulcat 2744 days ago

> However the hardware can be given hints as to which branch is more likely.

I don't think that is the case for modern (last 8 years or so) Intel processors. For example, I'm under the impression that gcc's __builtin_expect only affects the layout of the generated code. However I'd love to learn something new here; do you have a source or any additional info you could share?

link

gpderetta 2744 days ago

You are correct.

The hints are purely for the compiler. When branch probabilities are available (either via heuristics, annotations or profile data), it will optimize hot paths differently from cold paths. For example it might be more aggressive with inlinig or vice versa optimize for size. Also will attempt to put cold code in separate pages so that it doesn't get pollute the cache. Also non taken braches are marginally "faster" than taken so it is worth, when possible to put hot code in the non taken branch.

I'm not a compiler writer, I'm sure there is more.

link

scott00 2744 days ago

The source for this on Intel processors is the Intel Optimization Reference Manual, section 3.4.1: https://www.intel.com/content/dam/www/public/us/en/documents...

link

hyperpape 2744 days ago

To me, that section says that you can emit machine code that will allow the branch predictor to do a better job, not that you can control what the branch predictor does.

link

bluGill 2744 days ago

To my mind those amount to the same thing.

link

hyperpape 2744 days ago

See my other comment: https://news.ycombinator.com/item?id=18735053. I really should have consolidated them somehow.

link

bluGill 2744 days ago

My understanding is the layout is the hint. Modern CPUs would not want to use cache space for any hints, not to mention all the silicon to decode the hint.

Compiler writers are smart people and can layout code to give the CPU the right hints, so long as they know what the CPU does. CPU manufactures want the compilers writers to do this as the compiler is a significant factor in making one CPU faster than the competition in benchmarks.

link

hyperpape 2744 days ago

It seems to me that it's misleading to call that a hint. Rather, certain code paths create hard-to-predict branches, and others create predictable branches. A hint would be some kind of metadata that says "you'll want to predict this branch this way based on your algorithm, but please don't!"

link

hyperpape 2744 days ago

s/code paths/code patterns/

link

hyperpape 2744 days ago

I think you're discussing two different phases of optimization. PGO and Java's JIT use branch information to emit different machine code. Hardware branch prediction takes machine code and determines which branches in the machine code are taken, and speculates based on that information. There's an underlying pattern that both follow, but they're very different.

link

bluGill 2744 days ago

PGO and JIT both use their information to change the machine code to suggest which branch is more likely.

PGO and JIT also do a lot of other things that are unrelated to this discussion. Some of those things can have a much larger gain than branch prediction.

link