| > every async "function call" heap allocates. > require the STL That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior. One can define: void *operator new(size_t sz, Foo &foo) in the coro's promise type, and this: - removes the implicitly-defined operator new - forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case. Yes, green threads ("stackful coroutines") are more straightforward to use, however: - they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime) - they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too Really, stackless coros are just FSM generators (which is obvious if one looks at disasm) |
A pure library implementation that uses on normal function call semantics obviously needs to conservatively save at least all callee-save registers, but that's not the only possible implementation. An implementation with compiler help should be able to do significantly better.
Ideally the compiler would provide a built-in, but even, for example, an implementation using GCC inline ASM with proper clobbers can do significantly better.