|
|
|
|
|
by ioquatix
2938 days ago
|
|
I like `libco`. The API is a bit more heavy weight than what I submitted to Ruby. That's because `libco` tries to avoid an assembler and instead dynamically loads binary data into an executable section. In my opinion it's an unnecessary performance cost. The benefit is to avoid an assembler at compile time. I think it's better to use natively compiled code by the normal toolchain. I think it's a simpler design. |
|
Visual C++, GCC/Clang, and Intel C++ have quite different syntax for declaring inline assembler. I did not wish to burden libco users with having to use a specific compiler. Further, GCC/Clang can prove ... challenging to produce "naked" functions (without stack BP/SP adjustments), which is critical for libco's co_switch routine. I recall having trouble with a less popular OS. Using an external assembler like yasm or GNU as adds further dependencies on specific tools.
It is a very minor performance penalty to have the co_swap thunk, but it is non-zero, so I respect your decision. But do note at least that libco supports many architectures (without going pointlessly obscure.) It'd be a shame to miss out on cooperating in supporting architectures by duplicating our work over something like this.
With or without this, coroutines will always be slaughtered compared to a simple stackless state machine function, but they really shine and (in my opinion) are worth their overhead when you need to switch tasks in the middle of nested function calls.
...
So if you look at libco: https://gitlab.com/higan/higan/blob/master/libco/amd64.c#L26You'll notice that I get the return address into RAX as soon as possible. Believe it or not, this makes a real difference in performance. It allows the CPU to start fetching instructions after the JMP/RET even sooner than if you have it at the bottom of the function as you do.
The choice between push/pop and mov [rsp] for handling the non-volatile registers isn't really important. I found the latter slightly more performant on Athlon 64 CPUs, and a wash on Intel CPUs.
Preserving signals isn't so critical, but SSE is probably quite important, and only necessary on Windows. Again, your decision, but I would preserve it in your case. Can lead to really nasty surprises if you don't.