| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chrchang523 1596 days ago
	It is worth noting here that this "one assembly instruction" is not that cheap. The hardware on a multicore system does have to perform some locking under the hood to execute that instruction. But yes, it still has enough of an advantage over calling into the kernel to justify the additional usage complexity.

3 comments

rrauenza 1595 days ago

And on ccnuma systems, you end up getting memory contention and huuuuuge memory latencies, as well for the data also residing in the same cacheline. Often we would align and isolate these locks within a cacheline blocke... which also wastes a lot of space if you have a lot of these (I was consulting on an enterprise app that had millions I think. It made a big difference in the software's memory footprint.)

link

chacham15 1596 days ago

Im pretty sure the same construct can be implemented without the compare:

  int lock = 0;

  void AcquireLock(int *lock){
    while (ATOMIC_SWAP(lock, 1)){
      sleep(10); //or futex or w/e
    }
  }

  void ReleaseLock(int *lock){
    ATOMIC_SWAP(lock, 0);
  }

link

ww520 1596 days ago

The 'lock' variable is shared among threads. Compare is needed to avoid stomping on the lock acquired by another thread.

link

dragontamer 1595 days ago

No it isn't.

If lock = 1, you set the lock to 1 (aka do nothing).

If the lock is 0, you know it is unlocked and know you succeeded in acquiring the lock.

If many threads try to atomic swap, only one of them gets the zero.

------

The real issue here is the subtle memory barrier bug in that code.

link

ww520 1595 days ago

You are right. XCHG can do the job to acquire the lock. At least on x86, XCHG does lock the cache line on the address of the variable, so it should be ok.

link

ww520 1596 days ago

Yes, CMPXCHG is a relatively more expensive instruction. In x86 it is a memory barrier instruction.

link