Under the hood it's a effectively a single CAS instruction that loops on failure (which only occurs under contention, but then you have waiting with locks too).