|
|
|
|
|
by smosher
4862 days ago
|
|
After reading this I am wondering if there have been any measurements taken of the time 'wasted' in the kernel. You make it sound like there's a lot less wasted than is claimed by the author. I'm not convinced that the article has it exactly wrong. There are obvious problems, like all the extra moving parts your code has. But what about runtimes that multiplex their threads (or what have you) on to one host thread + scheduler per core? It seems to fit the description, and it has been taken up by Go and Rust and probably others (what is Erlang's multicore story these days?) It also avoids the obvious problem, since the extra moving parts are in the runtime, not your application (much like they were in the kernel, not your application in the standard multi-threaded case.) I still don't know if it saves you much. IIUC, the choice to multiplex is motivated by other reasons. |
|
In general, if you're using any other programming paradigm (i.e., 90+% of all software), you don't need tons of cheap threads; your application probably only needs at most one per core. In that case, you aren't constantly context switching (except to legitimately mutlitask with other programs), and you aren't using up that much kernel memory to hold thread state (because you only have a few threads, not a ton). So, you should just use the kernel as it was intended.
My assumption in reading the article is that the author was very much talking about the 90+% case (he didn't really say what he was talking about in the article).
There actually is another case though, which I think is what the author is really getting at. What if you have a small number of threads that need to be extremely high performance, and they have extremely short critical sections (which is not necessarily the common case across most applications)? Then, you would not want your threads to constantly suspend in the kernel every time there is a tiny bit of contention. You'd rather have them spin for a few cycles and just wait until they can actually get the lock... or just use hardware features (like the compare and exchange instructions) to abstract away the issue.
To do either of those, yeah, you pretty much need to code it yourself or (better) get a third party library. AFAIK. You would think pthreads would have a spinlock option that can never suspend (unlike futex, which can suspend), but if it does, I'm not aware of it.
In fact, formally speaking, there are lock-free algorithms, which don't have a lock, but can have unbounded retried before they get to access the data. And then you have wait-free algorithms, which can guarantee you get access to the data within a bounded number of retries.