Hacker News new | ask | show | jobs
by JonChesterfield 333 days ago
That's pretty damning too though.

A stackful coroutine is "write the live registers to your stack, swap the stack pointer to a suspended coroutine, load the old live registers from your new stack". It's a short and boring sequence of assembly.

A C++ coroutine is a CFG transform with a bunch of logic around heap allocation elision to construct something less capable than the above, with a bunch of keywords and semantics that you can kind of derive from the work the compiler needs to do to wire things together.

1 comments

If you want fibers there are ample mechanisms already available to implement them, they don't really benefit from specialized language machinery
Stackful coroutines would definitely benefit from being builtin in the language as you can get a significantly better ABI that you can do with a pure library based solution. You can sorta-kinda make it work with GCC extended inline assembly[1] but it is quite fragile as you need to handle exceptions, unwind info, red zones, etc.

Also you need compiler support to correctly handle thread_local.

[1] https://github.com/gpderetta/delimited/blob/master/delimited...

You can do somewhat better than that with clang.

attribute((naked)) on a function which has a single asm block as the implementation gives you control over argument passing and changing the stack pointer.

attribute((preserve_none)) on the same function spills most live registers to the stack in the caller. The coroutine switch doesn't need to do as many push/pop which makes it a bit more readable, but mainly this means you don't spill dead registers. That's the big thing you need compiler support for.

I believe the x64 redzone is a non-issue here as you've called the switch function, as opposed to tried to call from within inline asm (which does need to be careful about that). The magic globals are a problem though (floating point control thing, maybe signal mask, errno et al) so I guess don't use the magic globals from within fibres.

"thread_local" doesn't map very sensibly onto fibres. There have been compiler bugs in that area too. Storing some information at the start of the fibre stack works fine though, you just don't get syntactic support for allocating / dereferencing from it.

yes, preserve_none would be exactly what I want, except that I also want to avoid the call instruction in the final asm stream: as the call would not be paired with a ret, the call stack predictor will always mispredict it on every context switch, while an an indirect jmp has a much better chance to be predicted when two coroutines call each other in a tight loop (consider generators for example).

Ideally I think that a ctx_t* __builtin_context_switch(ctx_t* to) would need to be provided by the compiler.

Re thread_local, I believe at least MSVC has (had?) a fiber-safe flag that would handle thread_locals correctly by not caching addresses across function calls.