|
|
|
|
|
by khuey
14 days ago
|
|
The author is referring to false sharing (https://en.wikipedia.org/wiki/False_sharing). CPU caches operate at cache line granularity (typically 64 bytes) so writes to one part of the cache line can require synchronization with writes to non-overlapping parts of the same cache line. This can dramatically reduce performance when there are a large number of cores operating on the same cache line. If you remove the 64 byte alignment (which forces each counter variable onto a separate cache line) from hitcounter-shard.c you ought to be able to see the performance difference for yourself. |
|
The Q is: is it true the CPU mutexes are actually slower than those implemented in userspace?