Hacker News new | ask | show | jobs
by jcalvinowens 73 days ago
> I feel like using spinlocks in user space at all without kernel support like rseq is just asking for weird performance degradations.

Yeah, exactly. "Doctor, help, somebody replaced my wooden hammer with a metal one, and now I can't hit myself in the face with it as many times."

If you use spinlocks in userspace, you're gonna have a bad time.

1 comments

Most people looking for performance will reach for the spinlock.

The expectation is that the kernel should somehow detect applications that are spinning, and avoid preempting them early.

Well that seems like an unreasonable expectation no? Also isn't the point of spinlocks that they get released before the kernel does anything? Otherwise you could just use a futex... Which maybe you should do anyway...

https://matklad.github.io/2020/01/04/mutexes-are-faster-than...

The scheduling is based on how much the LWP made use of its previous time slices. A spinning program clearly is using every cycle it's given without yielding, and so you can clearly tell preemption should be minimized.
If you are spinning so long that it requires preemption, you're doing something wrong, no?
It doesn't matter, it's a long tail thing: on average user spinlocks can work, and even appear to be beneficial on benchmarks (for many reasons, Andy alludes to some above). But if you have enough users, some of them will experience the apocalyptic long tail, no matter what you do: that's why user spinlocks are unacceptable. RSEQ is the first real answer for this, but it's still not a guarantee: it is not possible to disable SCHED_OTHER preemption in userspace.

If I make something 1% faster on average, but now a random 0.000001% of its users see a ten-second stall every day, I lose.

It is tempting to think about it as a latency/throughput tradeoff. But it isn't that simple, the unbounded thrashing can be more like a crash in terms of impact to the system.

Yeah, the thrashing thing I'm very familiar on OOM scenario... the far most common Linux "crash" that I experience (at least monthly, sometimes daily, depending on what I'm doing)... I've waited overnight a few times but OOM killer still didn't activate.
Well, you can always pin to a core and move other threads out of that core.

That's what you'd do if manually scheduling. Ideally the dynamic scheduler would do that on its own.

Sure. But if you squint even that isn't good enough, you'll still take interrupts on that core in the critical section sometimes when somebody else wants the lock.

The other problem with spin-wait is that it overshoots, especially with an increasing backoff. Part of the overhead of sleeping is paid back by being woken up immediately.

When it's made to work, the backoff is often "overfit" in that very slight random differences in kernel scheduler behavior can cause huge apparent regressions.