| C++ solved this by shipping something that are not coroutines and branding it coroutines anyway. … They aren't coroutines, and by squatting on the name, make it borderline impossible that C++ will ever have coroutines. I am not a C++ person, but I protest this characterization. You obviously know the categories and I suppose you know the history, but I will recount them so that everyone else understands my objection. O.G. coroutines emerged in a world where subroutines were, by and large, not reëntrant. Function parameters, local variables, and return addresses were static, and there was no call stack in the modern sense. This is why the old timers said that coroutines were a generalization of subroutines; only the jump instruction of the call/return really needed to change. Once call stacks became common, coroutines fit awkwardly, and people tried to adapt them in various ways.¹ The world has settled on two designs for reconciling coroutines to the call stack: thick and thin coroutines. Thin coroutines allow suspension within the body of the coroutine but not within subroutine calls; this way the size of the coroutine’s state can be known at compile time and its interaction with the stack is relatively clear. Thick coroutines (i.e., green threads) can be suspended within a subroutine call, and thus require their own slices of stack — either separate from the main stack or copied from and to it as the coroutine is suspended and resumed. Thin coroutines are absolutely coroutines. They are truer to the original definition than thick coroutines are! They are more limited than thick coroutines, true; whether that makes them better or worse is a matter of design trade-offs. But they certainly deserve the name. [1] Simula 67, to my understanding, treated objects as a kind of coroutine instance where function definitions in the coroutine body became methods that closed over its local variables. |
The version that shipped can be done in the compiler front end. On the happy path it compiles to zero cost relative to writing the branches by hand. Machine architecture independent.
The version that didn't ship requires language runtime support. It involves allocating memory for the new stack and storing the live registers to it on yield. It's per-platform machine code, with varying overhead depending on how much control the compiler gives over calling conventions. Yield then looks a lot like a function call (and sometimes upsets branch predictors).
The full/stackful/green/thick/etc version works very like a posix thread without the pre-emptive scheduler, and needs language runtime support for exactly the same reasons that pthread_create does. They're zero cost if not used - they don't change the calling convention of other functions - but the yield usually can't be optimised out at compile time if they are used.
Naming things is indeed difficult and definitions do tend to shift over time. However the "C++ has coroutines now" feature box check doesn't bear up under scrutiny if one expects said coroutine to support the same operations that coroutines support in other languages.