| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cliffbean 4394 days ago
	There probably is close to zero overhead in a small benchmark. Hardware branch prediction is indeed good. However, it's a finite resource. In a large application, lots of needless branches everywhere translates into fewer branch prediction resources available for the branches that matter, which means more mispredictions. Also, they take up icache, itlb, etc. It's not at all obvious that the overhead would be close to zero in context.

2 comments

tomp 4394 days ago

Would it be possible to mark the JO instruction (jump-if-overflow) as unlikely, so that the CPU would always predict the branch to not be taken, without consuming one branch prediction slot?

link

jcalvinowens 4393 days ago

> so that the CPU would always predict the branch to not be taken

AFAIK there isn't any notion of likely/unlikely in object code on most CPU architectures. I'm only really familiar with x86 and ARM though.

__builtin_expect() in GCC (the "likely()" and "unlikely()" macros in the Linux kernel use this) can do a lot of things to optimize branches, like prefetch instructions and decide whether or not the jump should be the usually taken or usually not taken side, but it can't actually emit instructions that directly tell the CPU "always predict this" AFAIK.

link

jcalvinowens 4393 days ago

> It's not at all obvious that the overhead would be close to zero in context.

That's a fair point, although I think using a single function for all your checked addition like the one above would go a long way towards mitigating the resource waste you mention.

Maybe that's a naive assumption on my part: I suppose you could construct a branch predictor that maintains state based on the function call chain as opposed simply using the address of the branch... but that seems like it would be prohibitively complex.

link