| HN Mirror

> Stackless coroutines do not have this requirement, it is enough to be able to know the required frame size locally, without considering recursion/indirection.

If stackless coroutines only consider local frame size requirements, then you're back to heap allocating every call or we're only talking about using coroutines as leaves in the call graph. But if coroutines are only at the leaves, then they're definitely not fit for the purpose of concurrent programming--I've never written an asynchronous I/O program in C where my logical call depth of resuming functions is 1. If you graph all the callbacks and what they'd look like as normal function calls, it's definitely not 1.

Rust has "stackless" coroutines, but naturally they're not restricted to a call depth of 1. The compiler analyzes the call graph from the point of entry to the first resumable function and attempts to allocate space for all the persistent state necessary.

> Segmented stacks have their own tradeoffs too, checking if you have enough stack space at every function entry is not cheap.

In a world where the compiler can and is statically analyzing calls as deep as possible to optimize memory allocation patterns, that exact same facility could be used to optimize the points at which a segmented stack would check for and jump stacks. Which is my point--fundamentally these are the same things. A stack is a stack is a stack. We end up with the same data structures and they can be optimized the same. But in debates surrounding concurrency, people keep stating points layered on unstated assumptions. For example, that stackless coroutines will have a super amazing optimizing compiler for packing the call state, whereas for the segmented stack alternative universe somehow this technique wouldn't be used, and because it wouldn't be used then segmented stacks are therefore worse. That's circular logic.

The whole debate around concurrency techniques is muddled by these implicit assumptions. Some assumptions are defensible, but others not so much. It's difficult to identify and discuss which assumptions are defensible or not because these assumptions sit in the background. They come out as cudgels to defend a particular choice and are then quickly glossed over. Take, for example, the paper's mention of the incompatibility of fibers and thread-local storage facilities. The assumption here is that if C++ adopted fibers, the TLS issue wouldn't be remedied. That's ridiculous. If C++ adopted fibers, of course they'd work just fine with TLS. The whole reason the incompatibility exists, such as it does, is precisely because the compiler and language runtime aren't part of the equation--once they become part of the equation, then TLS semantics would naturally be addressed as a matter of course. If this was a real stumbling block, TLS could never have even been added to C++ in the first place.

I'm not trying to say that C++ or Rust should have adopted fibers or segmented stacks. Just that you can't draw any meaningful conclusions from these debates and these language design choices without understanding the self-imposed limitations and paths not explored. For example, the paper says that Rust tried and abandoned segmented stacks. Yes, but it did that long before it added async/await, and some of the language and compiler changes required for async/await could have (in principle) benefited segmented stacks.

So the fact that Rust abandoned segmented stacks tells us very little about segmented stacks. Rather, it speaks far more to the process by which languages and compilers are developed these days, where the path of least resistance is to put features at the only layer you completely control, even if it means severely compromising functionality. Contrast that with earlier languages like Java, where the evolution of platform threading facilities, especially on Linux and *BSD, where shaped by Java, rather than Java being shaped by the limitations of those facilities.