| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by grundprinzip 4173 days ago
	I'm not sure, but the assembly code generated when using __builtin_expect looks almost identical.

3 comments

ot 4173 days ago

There's actually a huge difference: if you use __builtin_expect the slow path is still part of the function.

If the fast path is small enough, you probably want to inline the function, but the slow path makes it large enough that the compiler doesn't inline it (for good reasons).

If you force the slow path into a separate function, then your function becomes fast path + a call instruction, and it can be inlined.

I've often done this manually, and this is a very cool trick that I'm going to adopt immediately.

EDIT: plus, it's also beneficial for instruction cache/iTLB, as cornstalks pointed out.

link

Scaevolus 4173 days ago

Yes, unlikely()/likely() is a much better way to give the compiler hints about block linearization.

Here it is on godbolt: http://goo.gl/iPCiSy

link

abdulla 4164 days ago

I'm guessing the reason for the !! in the macro is to force a boolean conversion. Is there a better explanation?

link

gaze 4173 days ago

This is so strange! If something is unlikely to execute, it is unlikely to execute due to a condition, which you clue the branch predictor as unlikely and the layout thingy as out of line. Why would something be unconditionally unlikely to execute?! I don't understand what a lambda buys you here.

link

nkurz 4173 days ago

In theory, the lambda combined with the noinline attribute forces the compiler to use a call/ret to a function rather than making a local jump. Since the function usually will have alignment requirements, this often will mean the error handling is in a different 64-byte cache line. If this error never occurs, this cache line will never occupy space in the instruction cache (i-cache), and useful instructions can be held there instead.

In practice, I'd be surprised if you can come up with a case where this makes a significant difference. If you are on a hot enough path for this to matter, on a modern processor you probably are running out of the even lower level decoded µop cache, which doesn't cache µops for branches that are not taken. If you aren't in this cache, your efforts are probably better spent making this happen.

Edit: ot's comment about how this affects the size of the parent function and whether it will be inlined is a good point, and might well make a measurable difference in the cases where it is true.

link

mattgodbolt 4172 days ago

Author here. I use this trick for a general 'bail out with error code' throw macro. By definition it's the exceptional case and so I'm happy to take the hit of going out of line. It's a macro used all over a very large codebase and so it made sense to do this. And yes, in some cases it helps the compiler inline more aggressively where it does matter.

link

cornstalks 4173 days ago

It's similar, but the whole point of the trick here is to move the generated code so it doesn't "pollute" the instruction cache in the CPU. __builtin_expect still puts the error-handling instructions right next to the rest of the code. The lambda puts the error-handling instructions elsewhere.

Whether or not this will impact performance depends, and will need some careful profiling. But I can imagine some situations where keeping the "hot" instructions in the cache and the "cold" (error-handling) instructions out of the cache could be beneficial.

link

rlpb 4172 days ago

> __builtin_expect still puts the error-handling instructions right next to the rest of the code. The lambda puts the error-handling instructions elsewhere.

Surely that's a decision for the optimizer to make, in the case of __builtin_expect?

link