Hacker News new | ask | show | jobs
by quinnftw 3360 days ago
I'd be interested in seeing the rational behind the fairshare policy. I suppose theoretically the firstfit policy could result in starvation via one thread constantly preempting another during the lock contention, but I don't imagine that would happen very often in practise.
5 comments

Starvation is a serious problem. Imagine two cores both trying to get a lock. The lock is held by a cache line in one of the cores. If both cores try to get the lock very often, the core further away may very well never get the lock.

The synthetic benchmark results don't show that because they behave nicely: each thread waits politely in between each attempt to get the lock. Real code doesn't always do that.

> Imagine two cores both trying to get a lock. The lock is held by a cache line in one of the cores. If both cores try to get the lock very often, the core further away may very well never get the lock.

If that were true, then your problem is L2-cache-bound and trying to fit your solution into SMP is the Wrong Thing. In fact, the single-threaded behavior you end up with is going to be faster (by definition!) than the "fair" architecture you seem to want.

No one serious thinks this behavior of the default locking primitive is a good thing. Maybe (maybe) it's a good fit for some particular problem somewhere, but I'd want to see a benchmark. It's definitely not a consensus opinion among people who do serious thought about synchronization.

> No one serious thinks this behavior of the default locking primitive is a good thing.

Clearly some people do or OS X wouldn't do it.

Would it be the first badly implemented thing in OS X?
Mutexes are heavily used and get a lot of scrutiny, and work was done to speed them up when 10.11 (or was it 10.10? Whichever one added QoS) came out. I'm positive the people who maintain it are aware of the tradeoffs between fair and unfair locks and made the default fair intentionally.

Or more generally, just because Linux and Windows both behave in a certain manner doesn't make that de facto correct. It's all tradeoffs, and different people value different things. Correct-by-default (i.e. fair locks) is valuable, the only question is whether it's worth the performance hit (although you can always opt in to unfair locks to get better performance).

> I'm positive the people who maintain it are aware of the tradeoffs between fair and unfair locks and made the default fair intentionally.

I think you're neglecting a couple of less-technical factors here. Yes, fairness was certainly made the default for reasons at some point (but still, even then, one might argue that an opt-in solution might have been better). On the other hand, there's the likely possibility that those reasons don't really hold true anymore. Think of Scene Graphs vs. Entity Component Systems in high-performance videogame design - in this example the rise of caching made a whole architecture out-dated.

On the other hand, like removing the GIL in Python, such decisions are not to be taken lightly because of the things you will break. It's very likely that there are applications that would still have problems with starving threads, and just switching from opt-out to opt-in will make them break for no apparent reason in the strangest of circumstances. I know, Apple likes to break things more often than MS, but I'd guess that´s not a risk they're willing to take. Imagine you're updating your OS and a dozen apps that worked for a decade and don´t get updates anymore start behaving strangely.

So, it's not unreasonable to settle for a less-than-optimal solution that still keeps things working and only makes things slower in the worst-case scenario. That doesn't mean it's not open to criticism, though.

In my personal experiences I've seen quite the opposite.

Specifically, this kind of starvation mostly occurs only in synthetic lock benchmarks.

In the real world, cores should be actually doing work in between spinning on cache-line locks; and the time before they get back to locking should be fairly random by quite a reasonable number of cycles - which tends to largely mitigate the starvation issue you talk about.

Basically, if you have code that is structured to by default sit on spin-locks, there is likely an architectural issue in play.

I'd go with the other contributor who said something along the lines of "the best policy is to go with the fastest lock possible", unless there is some very niche reason not to.

Why are two threads trying to get the same lock very often? This seems like a design flaw: you want threads to mostly be uncontended, with lots of time spent outside locks.
absolutely, I totally agree.
Even if they get the lock, task latencies will be all over the place with seriously high latencies in the high percentiles.
He doesn't give a lot of details, but somewhere in the first 20 minutes of this talk (probably 15 minutes in), Cliff Click says that Hotspot had to create fair locks in user-space because kernel provided locks weren't adequate.

https://youtu.be/Hqw57GJSrac?t=341

Thank you for the video, very interesting. The time for what you mentioned is here: https://youtu.be/Hqw57GJSrac?t=1104 (from 18:24 to 24:00)
Its funny that he throws out the "painful" learning of "don't trust the OS." I don't disagree, but it seems that the main learning should be "know your assumptions."

Consider, this just moved up the stack. A good mantra for developers on the JVM is "don't trust the VM."

Obviously, at some level you are trusting them, but you should also have measurements in place that would let you know which key assumption is failing. Yes, this is easier said than done.

Starvation and similar problems have plagued software using locks for a long time. A prominent example which is well documented may be the various problems of the GIL implementation when certain workloads are applied (easiest example is one I/O bound thread and one CPU bound thread, which is also a typical real world scenario - ouch!)

[It should be noted that it is commonly rejected that the GIL is a scheduler, acts like a scheduler or should be a proper scheduler; all of which are untrue. Many locks are used as schedulers as well, which is especially true for a lock that controls execution of any and all code.]

Which GIL are you talking about?
Sorry, I meant the Python GIL. Should've written that.
Do remember that current CPUs can dynamically run cores/hyper-threads at different clock speeds at the same time. Not to mention significantly mismatched performance ones (eg ARM big.LITTLE).

If you didn't have fairness, and the developer hadn't carefully assigned threads to cpus based on performance / priority of the work the thread is doing (and locked the CPU clock speeds), then the higher performing ones will starve out the others.

Fairness at least ensures that the developers writing simple straightforward code would get simple straightforward behaviour.

Then again, if you are writing simple straightforward code you shouldn't be mucking around with thread priorities. In fact probably you shouldn't be dealing with raw threads and mutices in the first place but using some higher level abstraction.
It's probably a very old decision, from times when most desktop computers had one CPU and threads were mostly used to keep GUI responsive while the application was churning data in the background, so there was little interaction between them.