Hacker News new | ask | show | jobs
by jandrewrogers 1022 days ago
The performance overhead of threads is largely unrelated to how many you have. The thing being minimized with async code is the rate at which you switch between them, because those context switches are expensive. On modern systems there are many common cases where the CPU time required to do the work between a pair of potentially blocking calls is much less than the CPU time required to yield when a blocking call occurs. Consequently, most of your CPU time is spent yielding to another thread. In good async designs, almost no CPU time is spent yielding. Channels will help batch up communication but you still have to context switch to read those channels. This is where thread-per-core software architectures came from; they use channels but they never context switch.

Any software that does a lot of fine-grained concurrent I/O has this issue. Database engines have been fighting this for many years, since they can pervasively block both on I/O and locking for data model concurrency control.

1 comments

The cost of context switching in "async" code is very rarely smaller than the cost of switching OS threads. (Exception is when you'ree using a GC language with some sort of global lock.)

"Async" in native code is cargo cult, unless you're trying to run on bare metal without OS support.

The cost of switching goroutines, rust Futures, Zig async Frames, or fibers/userspace-tasks in general is on the other of a few nano-seconds whereas it's in the micro-second range for OS threads. This allows you to spawn tons of tasks and have them all communicate with each other very quickly (write to queue; push receiver to scheduler runqueue; switch out sender; switch to receiver) whereas doing so with OS threads would never scale (write to queue; syscall to wake receiver; syscall to switch out sender). Any highly concurrent application (think games, simulations, net services) uses userspace/custom task scheduling for similar reasons.
Nodejs is inherently asynchronous and the JavaScript developers bragged during its peak years how it was faster than Java for webservers despite only using one core because a classic JEE servlet container launches a new thread per request. Even if you don't count this as "context switch" and go for a thread pool you are deluding yourself because a thread pool is applying the principles of async with the caveat that tasks you send to the thread pool are not allowed to create tasks of their own.

There is a reason why so many developers have chosen to do application level scheduling: No operating system has exposed viable async primitives to build this on the OS level. OS threads suck so everyone reinvents the wheel. See Java's "virtual threads", Go's goroutines, Erlang's processes, NodeJS async.

You don't seem to be aware what a context switch on an application level is. It is often as simple as a function call. There is no way that returning to the OS, running a generic scheduler that is supposed to deal with any possible application workload that needs to store all the registers and possibly flush the TLB if the OS makes the mistake of executing a different process first and then restore all the registers can be faster than simply calling the next function in the same address space.

Developers of these systems brag about how you can have millions of tasks active at the same time without breaking any sweat.