| HN Mirror

> The locks should be on separate cachelines, that's what the CachePadded::new is for

I see that now. I looking for it in the AmdSpinlock struct, but that kind of makes sense.

> The futex syscall is already the raw interface that is designed on the assumption that the caller only uses it in the slow path of whatever higher-level synchronization primitive they're implementing, so trying to use VDSO tricks to implement futex would be redundant.

Ah. Thanks. I didn't know how far you could get with MWAIT, but I guess you still need to deschedule. I also didn't realize futex was a direct syscall and there was no user level api going on around it.

Is he running 32 threads even in the low contention case? And not pinning? There's something about his numbers that just seem a little too high for what I would expect. I've seen this around a lot, and the reason the mutex usually wins is that is basically does a spin of 1 or more then goes into a mutex (the pthread code appers to spin 100 times before falling back to a futex).

At work I use a spin lock on a shared memory region because it does test out to be lower latency than std::mutex and we're not under much contention. I've though about replacing it with a light futex-based library, but doesn't seem to be quicker.

He still seems to be getting some contention, and I'm trying figure out how.