|
|
|
|
|
by wallstprog
619 days ago
|
|
" I also managed to make contended nsync mutexes go 30% faster than nsync upstream on AARCH64, by porting it to use C11 atomics." Curious about this -- so what does C11 atomics use to implement? At least in Linux, C++11 atomics use pthreads (not the other way around). |
|
1. Weak CAS. nsync always uses strong CAS upstream to make the portability abstraction simpler. Being able to use weak CAS when appropriate helps avoid code being generated for an additional loop.
2. Updating the &expected parameter. nsync upstream always manually does another relaxed load when a CAS fails. This isn't necessary with the C11 atomics API, because it gives you a relaxed load of the expected value for free when it fails.
Being able to exploit those two features resulted in a considerable improvement in nsync's mu_test.c benchmark for the contended mutex case, which I measured on RPI5.