| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by torstenvl 1073 days ago

Better than random input, but still only ~half as fast as using sete

    [19:13:34 user@boxer ~/src/looptest] $ diff -u bench.c bench-alls.c      
    --- bench.c 2023-07-06 16:04:16.000000000 -0400
    +++ bench-alls.c 2023-07-06 19:13:34.000000000 -0400
    @@ -17,7 +17,7 @@
       int num_rand_calls = number / CHAR_BIT + 1;
       unsigned char *buffer = malloc(num_rand_calls * CHAR_BIT);
       for (int i = 0; i < num_rand_calls; i++) {
    -    buffer[i] = rand();
    +    buffer[i] = 's'; //rand();
       }
       return buffer;
     }
    [19:13:37 user@boxer ~/src/looptest] $ gcc -O3 bench-alls.c loop2.s -o l2
    [19:13:42 user@boxer ~/src/looptest] $ gcc -O3 bench-alls.c loop4.s -o l4
    [19:13:47 user@boxer ~/src/looptest] $ time ./l2 1000 1
    250001000
    ./l2 1000 1  0.69s user 0.00s system 99% cpu 0.699 total
    [19:13:55 user@boxer ~/src/looptest] $ time ./l4 1000 1
    250001000
    ./l4 1000 1  1.28s user 0.00s system 99% cpu 1.290 total

Jumps are slower.

2 comments

Guvante 1073 days ago

Microbenchmarks are hard. You aren't doing any meaningful work that could benefit from speculatively executing instead of stalling for the conditional value.

Similarly you might be busting the pipeline by chaining together the jumps so close together.

Not saying your point is wrong, just saying your proof isn't super solid.

link

gpderetta 1072 days ago

In this benchmark the only loop carried dependency is over the res variable (edit: and of course the index). The jump doesn't break these dependencies, so for this specific problem, the additional latency of the cmov doesn't matter as it is always perfectly pipelined and cmov will always come up on top. But if the input of cmov depended on a previous value, then potentially a branch could be better given an high enough prediciton rate.

link