| >essentially no cost unless the lock is contested. Atomics are not free. And fetching a cache line can be very expensive, even without contention. However lock less algorithms are often worse because they have even more atomics and even more cache line transfers. And they're really hard to get right.
(iirc there was a statistic somewhere about the average number of corrections in lock less papers -- it wasn't pretty) Back to the non contended locks. Usually the way to get non contended locks is to push down the locks to smaller and smaller pieces of data. But imagine what happens when you have some contention (e.g. due to a freak hash collisions). Instead of paying the cache transfer cost once you'll pay it many times, and that can really ruin your day (speed of light is very limiting at nano second scales) I wrote more about this here [1] I am somewhat biased but I would recommend that every time you consider something lockless, consider using lock elision/HTM instead (see [2] and [3]). It often has most of the advantages with few of the down sides (but admittedly also some of its own quirks) [1] http://halobates.de/applicative-mental-models.pdf
[2] http://queue.acm.org/detail.cfm?id=2579227
[3] http://www.intel.com/software/tsx |
Intel shipped two generations of processors with broken TSX before getting it right, and then timing attacks seriously called into question whether it was a safe feature to use in practice. Then Intel dropped it from their first post-Skylake microarchitecture products, though it looks like they're still planning to support it on their next generation of server CPUs. Is it really worth the trouble at this point?