Hacker News new | ask | show | jobs
by chrchang523 1596 days ago
It is worth noting here that this "one assembly instruction" is not that cheap. The hardware on a multicore system does have to perform some locking under the hood to execute that instruction. But yes, it still has enough of an advantage over calling into the kernel to justify the additional usage complexity.
3 comments

And on ccnuma systems, you end up getting memory contention and huuuuuge memory latencies, as well for the data also residing in the same cacheline. Often we would align and isolate these locks within a cacheline blocke... which also wastes a lot of space if you have a lot of these (I was consulting on an enterprise app that had millions I think. It made a big difference in the software's memory footprint.)
Im pretty sure the same construct can be implemented without the compare:

  int lock = 0;

  void AcquireLock(int *lock){
    while (ATOMIC_SWAP(lock, 1)){
      sleep(10); //or futex or w/e
    }
  }

  void ReleaseLock(int *lock){
    ATOMIC_SWAP(lock, 0);
  }
The 'lock' variable is shared among threads. Compare is needed to avoid stomping on the lock acquired by another thread.
No it isn't.

If lock = 1, you set the lock to 1 (aka do nothing).

If the lock is 0, you know it is unlocked and know you succeeded in acquiring the lock.

If many threads try to atomic swap, only one of them gets the zero.

------

The real issue here is the subtle memory barrier bug in that code.

You are right. XCHG can do the job to acquire the lock. At least on x86, XCHG does lock the cache line on the address of the variable, so it should be ok.
Yes, CMPXCHG is a relatively more expensive instruction. In x86 it is a memory barrier instruction.