Hacker News new | ask | show | jobs
by Joker_vD 261 days ago
Just so you know, the "return x < 0 ? -x : x;" compiles into

    abs_branch:
        mov     eax, edi
        neg     eax
        cmovs   eax, edi
        ret
on x64, and into

    abs_branch:
        srai    a5,a0,31
        xor     a0,a5,a0
        sub     a0,a0,a5
        ret
on RISC-V if you use a C compiler with a half-decent codegen. And "branchy" clamp() translates into

    clamp:
        cmp     edi, edx
        mov     eax, esi
        cmovle  edx, edi
        cmp     edi, esi
        cmovge  eax, edx
        ret
Seriously, the automatic transformation between ?: and if-then-else (in both directions) is quite well studied by now. And if you try to benchmark difference between branching and branchless implementations, please make sure that the branches you expect are actually there in the compiler's output.
3 comments

Yeah, if you care about branches for any reason then you need to at least verify the compiler's output, if not drop down to assembly completely.
The inverse problem is here too:

> int partition_branchless(int* arr, int low, int high) {... for (int j = low; j < high; j++) { ... }... }

That for loop is just a sugared while loop which is just a sugared cmp and jmp

The articleesays that it optimizes just the inside of the loop (the loop jump can be optimized via unrolling, which the compiler may do automatically).
In fact, I have seen gcc optimize clever hacks that tried to use multiplication, into conditional moves (on aarch32).

There is this common misconception that conditional moves == branching. On actually relevant software architectures, they very much are not. Replacing a p=0.5 branch into a conditional move is in itself a significant optimization.