Hacker News new | ask | show | jobs
by darkhelmet 646 days ago
We took a shot at doing ultra-fast kernel threads on FreeBSD a few decades ago. For various reasons, it was reverted and removed a few major versions later.

If you look a the old KSE work, the general gist was that if you were about to block in a syscall then you'd effectively get a signal-style longjmp back to your userland thread scheduler. You'd pick another thread and continue running all in the same process/task context.

There were many problems with what we did and how we did it, but the unavoidable fundamental problem at the time was that it inverted assumptions about costs of low level primitives. Important(TM) software was optimized for the world where threads and blocking were expensive and things like pthread mutex operations were cheap. Our changes made threads and blocking trivially cheap but added non-trivial overhead to pthread mutex etc operations. Applications that made extensive use of pthread mutexes to coordinate work dispatching on a precious small pool of expensive threads were hit with devastating performance losses. Most critically, MySQL. We'd optimized for hundreds of thousands of threads rather than the case of multiplexing work over a few threads.

It became apparent that this was going to be an eternal uphill battle and we eventually pulled the plug to do it the same way as Linux. We made a lot of mistakes with all of this.

3 comments

NetBSD also tried it and reverted to kernel threads, some links here [1].

[1] https://en.wikipedia.org/wiki/Scheduler_activations

This sort of candor about hard won lessons is what I value most about this community. Thank you!
Why were mutexes more expensive?
Since they mention "a few decades ago" I'm guessing it's a very different mutex than the one you'd use today.

Today I believe all of these operating systems (except MacOS, LOL) have either futex or an equivalent technology (the Windows thing works on bytes rather than aligned 32-bit values, so that's cool) and so it seems like only contention could be more expensive as the whole point of a futex is that uncontended acquisition is a single CPU store and that's userspace, the kernel is nowhere near it.

macOS has futexes it just doesn't talk about them... https://crates.io/crates/ulock-sys
I was not aware. I wonder if anybody tried asking Apple? That's what happened for SRWLOCK in Windows. Mara asked Microsoft if they can promise it actually does what it seems like it does, formally, and they did document that promise so then Rust used SRWLOCK [today your Rust does not use SRWLOCK, this year it was replaced for performance reasons, which is timely because it turns out SRWLOCK is faulty and so "Don't use SRWLOCK" is the most practical fix in the medium term, awkward for poor C++ though as they used SRWLOCK too].

Apple are even less talkative than Microsoft, but it is possible that the reason libc++ uses this is that it's actually guaranteed and if properly asked Apple would say so. It would be nice for (newer) macOS to get the same promises as the mainstream platforms with respect to these primitives.