|
|
|
|
|
by sfilmeyer
265 days ago
|
|
I enjoyed reading the article, but I'm pretty thrown by the benchmarks and conclusion. All of the times are reported to a single digit of precision, but then the summary is claiming that one function shows an improvement while the other two are described as negligible. When all the numbers presented are "~5ms" or "~6ms", it doesn't leave me confident that small changes to the benchmarking might have substantially changed that conclusion. |
|
At a 5ms baseline with millisecond precision, the smallest improvement you can measure is 20%. And you cannot distinguish a 20% speedup with a 20% slowdown that happened to get luck with clock ticks.
For what it is worth, I ran the provided test code on my machine with a 100x increase in iterations and got the following:
Which is not exactly encouraging (gcc 13.3.0, -ffast-math -march=native. I did not use the -fomit-this-entire-function flag, which my compiler does not understand).I had to drop down to O0 to see branchless be faster in any case: