Hacker News new | ask | show | jobs
by alain94040 3360 days ago
Starvation is a serious problem. Imagine two cores both trying to get a lock. The lock is held by a cache line in one of the cores. If both cores try to get the lock very often, the core further away may very well never get the lock.

The synthetic benchmark results don't show that because they behave nicely: each thread waits politely in between each attempt to get the lock. Real code doesn't always do that.

4 comments

> Imagine two cores both trying to get a lock. The lock is held by a cache line in one of the cores. If both cores try to get the lock very often, the core further away may very well never get the lock.

If that were true, then your problem is L2-cache-bound and trying to fit your solution into SMP is the Wrong Thing. In fact, the single-threaded behavior you end up with is going to be faster (by definition!) than the "fair" architecture you seem to want.

No one serious thinks this behavior of the default locking primitive is a good thing. Maybe (maybe) it's a good fit for some particular problem somewhere, but I'd want to see a benchmark. It's definitely not a consensus opinion among people who do serious thought about synchronization.

> No one serious thinks this behavior of the default locking primitive is a good thing.

Clearly some people do or OS X wouldn't do it.

Would it be the first badly implemented thing in OS X?
Mutexes are heavily used and get a lot of scrutiny, and work was done to speed them up when 10.11 (or was it 10.10? Whichever one added QoS) came out. I'm positive the people who maintain it are aware of the tradeoffs between fair and unfair locks and made the default fair intentionally.

Or more generally, just because Linux and Windows both behave in a certain manner doesn't make that de facto correct. It's all tradeoffs, and different people value different things. Correct-by-default (i.e. fair locks) is valuable, the only question is whether it's worth the performance hit (although you can always opt in to unfair locks to get better performance).

> I'm positive the people who maintain it are aware of the tradeoffs between fair and unfair locks and made the default fair intentionally.

I think you're neglecting a couple of less-technical factors here. Yes, fairness was certainly made the default for reasons at some point (but still, even then, one might argue that an opt-in solution might have been better). On the other hand, there's the likely possibility that those reasons don't really hold true anymore. Think of Scene Graphs vs. Entity Component Systems in high-performance videogame design - in this example the rise of caching made a whole architecture out-dated.

On the other hand, like removing the GIL in Python, such decisions are not to be taken lightly because of the things you will break. It's very likely that there are applications that would still have problems with starving threads, and just switching from opt-out to opt-in will make them break for no apparent reason in the strangest of circumstances. I know, Apple likes to break things more often than MS, but I'd guess that´s not a risk they're willing to take. Imagine you're updating your OS and a dozen apps that worked for a decade and don´t get updates anymore start behaving strangely.

So, it's not unreasonable to settle for a less-than-optimal solution that still keeps things working and only makes things slower in the worst-case scenario. That doesn't mean it's not open to criticism, though.

> Yes, fairness was certainly made the default for reasons at some point (but still, even then, one might argue that an opt-in solution might have been better).

I think an unfair or opt-in-fair solution was deemed worse than fair-by-default for the simple reason that the most straightforward way to implement mutexes is

  enter kernel mode
  maybe grab a spinlock if on SMP
  add yourself to the waiting list
  sleep until woken up

  do critical section

  enter kernel mode
  spinlock
  wake the first waiting thread
That's a functionality every mutex must have and it happens to give fair semantics for free. Making semantics weaker for the purpose of optimization (and not just for the sake of making application developer's life harder or future flexibility which may happen to never be needed) actually takes additional work on top of that.

Since we are talking about a uniprocessor desktop OS developed in the nineties, it's plausible they didn't care about mutex performance as much as today and giving this extra guarantee afforded by their simple implementation seemed reasonable at the time.

Apple can and does update APIs in ways that preserve old behavior for apps linked against older SDKs, specifically so old apps continue to work. They could do the same here with versioned symbols, if they wanted to switch mutexes to unfair-by-default. I believe they intentionally go with fair-by-default because it's the safe choice, the choice that guarantees program correctness, since most people don't think about this sort of thing and probably won't be able to figure out themselves when their locks should be fair or unfair. It's the same reason C11 and C++11 atomics default to sequential consistency when not otherwise specified; that's the slowest memory ordering, but it's the one that's guaranteed to be correct for all cases. If you know what you're doing you can override the default and pick something else.
In my personal experiences I've seen quite the opposite.

Specifically, this kind of starvation mostly occurs only in synthetic lock benchmarks.

In the real world, cores should be actually doing work in between spinning on cache-line locks; and the time before they get back to locking should be fairly random by quite a reasonable number of cycles - which tends to largely mitigate the starvation issue you talk about.

Basically, if you have code that is structured to by default sit on spin-locks, there is likely an architectural issue in play.

I'd go with the other contributor who said something along the lines of "the best policy is to go with the fastest lock possible", unless there is some very niche reason not to.

Why are two threads trying to get the same lock very often? This seems like a design flaw: you want threads to mostly be uncontended, with lots of time spent outside locks.
absolutely, I totally agree.
Even if they get the lock, task latencies will be all over the place with seriously high latencies in the high percentiles.