|
|
|
|
|
by torstenvl
1073 days ago
|
|
Better than random input, but still only ~half as fast as using sete [19:13:34 user@boxer ~/src/looptest] $ diff -u bench.c bench-alls.c
--- bench.c 2023-07-06 16:04:16.000000000 -0400
+++ bench-alls.c 2023-07-06 19:13:34.000000000 -0400
@@ -17,7 +17,7 @@
int num_rand_calls = number / CHAR_BIT + 1;
unsigned char *buffer = malloc(num_rand_calls * CHAR_BIT);
for (int i = 0; i < num_rand_calls; i++) {
- buffer[i] = rand();
+ buffer[i] = 's'; //rand();
}
return buffer;
}
[19:13:37 user@boxer ~/src/looptest] $ gcc -O3 bench-alls.c loop2.s -o l2
[19:13:42 user@boxer ~/src/looptest] $ gcc -O3 bench-alls.c loop4.s -o l4
[19:13:47 user@boxer ~/src/looptest] $ time ./l2 1000 1
250001000
./l2 1000 1 0.69s user 0.00s system 99% cpu 0.699 total
[19:13:55 user@boxer ~/src/looptest] $ time ./l4 1000 1
250001000
./l4 1000 1 1.28s user 0.00s system 99% cpu 1.290 total
Jumps are slower. |
|
Similarly you might be busting the pipeline by chaining together the jumps so close together.
Not saying your point is wrong, just saying your proof isn't super solid.