|
|
|
|
|
by uecker
169 days ago
|
|
So here are my preliminary benchmarks with my own implementation on an AMD EPYC 9334 32-Core processo. I need to double checks things - so take this with a grain of salt for now. Time is in seconds for 100000 iterations of manorboy(10). So far, the only implementation which clearly sucks is std::function<>. Even trampolines are suprisingly good (but I can imagine that they are much worse on other CPUs / architectures) xgcc (GCC) 16.0.0 20260103 (experimental)
1.50 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack
1.11 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack -DREFARG
7.21 gcc -ftrampoline-impl=heap
7.34 gcc -ftrampoline-impl=heap -DREFARG
0.93 gcc -DWIDEPTR
1.38 gcc -DWIDEPTR -DREFARG
1.40 gcc -DDIRECT
1.05 gcc -xc++ -std=c++26 -DFUNCREF -DDEDUCING
19.68 gcc -xc++ -std=c++26 -DDEDUCING
20.73 gcc -xc++ -std=c++26
6.31 gcc -xc++ -std=c++26 -DDEDUCING -DREFARG
6.31 gcc -xc++ -std=c++26 -DREFARG
Debian clang version 16.0.6 (15~deb12u1)
21.11 clang -xc++
6.16 clang -xc++ -DREFARG
1.66 clang -fblocks
1.70 clang -fblocks -DREFARG
|
|