Alternative way, coming from linux kernel (https://docs.kernel.org/staging/static-keys.html) is to leverage knowledge about branches direction and literally change the instructions via run-time code patching to avoid the potential branching overhead all together.
When to go for such extreme solution? When performance really matters:
- `and` branches are known at compile-time
- `or` branches are not changing often at run-time
- `and/or` branches are expensive to compute/require memory access
- `and/or` branches are hard to learn by the hardware branch predictor due to their random nature
Some examples include logging, tracing, configuration, hot-path, algo, etc.
> https://github.com/boost-ext/sb - x86-64/linux/gcc/clang moves the solution to user space (for run-time) as well as to compile-time with C++20.
template<auto tag = []{}> // Note: `tag` is required to delay `fun`
// instantiation as get has to be dependent
auto fun() {
if constexpr (sb::const_branch<"compile-time branch">::get<tag>()) {
std::puts("taken");
} else {
std::puts("not taken");
}
}
int main() {
fun(); // not taken
static_assert(sb::const_branch<"compile-time branch">::set<true>());
fun(); // taken
static_assert(sb::const_branch<"compile-time branch">::set<false>());
fun(); // not taken
}
main: // $CXX -O3
lea rbx, [rip + .L.str.1]
mov rdi, rbx
call puts@PLT # not taken
lea rdi, [rip + .L.str.2]
call puts@PLT # taken
mov rdi, rbx
call puts@PLT # not taken
xor eax, eax # return 0
ret
.L.str.1: .asciz "not taken"
.L.str.2: .asciz "taken"
Branchless computing (https://www.youtube.com/watch?v=g-WPhYREFjk) is a technique which may help a lot with mitigating the overhead of branching.
Alternative way, coming from linux kernel (https://docs.kernel.org/staging/static-keys.html) is to leverage knowledge about branches direction and literally change the instructions via run-time code patching to avoid the potential branching overhead all together.
When to go for such extreme solution? When performance really matters:
Some examples include logging, tracing, configuration, hot-path, algo, etc.> https://github.com/boost-ext/sb - x86-64/linux/gcc/clang moves the solution to user space (for run-time) as well as to compile-time with C++20.
> static_branch - Minimal overhead run-time branch (https://godbolt.org/z/797WeYYGK) / (nop=not taken / jmp=taken)
> const_branch - Zero overhead compile-time branch (https://godbolt.org/z/xEboP8WYe) Acknowledgments- https://docs.kernel.org/staging/static-keys.html - https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html - https://www.intel.com/content/www/us/en/developer/articles/t... - https://www.agner.org/optimize/instruction_tables.pdf - https://www.felixcloutier.com/x86 - https://uops.info/table.html - https://arxiv.org/abs/2308.14185