Hacker News new | ask | show | jobs
by byuu 4305 days ago
Just a forewarning: that library, like most C cooperative threading libraries, is using ucontext.

ucontext is unbearably slow, to the point where it's just as fast to use real threads and mutexes, even on a single-core system. There is nothing lightweight about it. (The technical reason is because they call into kernel functions to perform their magic.)

The actual logic of saving/restoring registers and the stack frame requires about 5-15 instructions per platform, and is very easy to write in assembly. And for the platforms you don't do this for, there's a non-standard trick to modifying jmpbuf which works on x86/amd64, ppc32/64, arm, mips, sparc, etc. These techniques are literally hundreds of times faster.

Try libco instead. It's the bare minimum four functions needed for cooperative threading, which lets you easily build up all the other stuff these libraries provide, if and only if you want them. And if you benchmark libco against all of the other stack-backed coroutine libraries, I'm sure you will be stunned at just how bad ucontext really is. That people keep using ucontext in their libraries tells me that they have never actually used cooperative threading for any serious workloads.

http://byuu.org/programming/libco/

1 comments

libtask isn't using the ucontext syscall (normally), just the ucontext_t struct. If you look at the source code, it provides local assembly versions of the context swapping code for the most common architectures.

That said, your library is quite nice in that it just provides the co-routine wrappers and nothing else, which I appreciate. Also, your switch code has less instructions (are there cases it doesn't handle?), but if you look at libtask's code, it probably not going to be 100x faster. :-)

Oh ... I looked at libtask/task.c and found:

    static void
    contextswitch(Context *from, Context *to)
    {
    	if(swapcontext(&from->uc, &to->uc) < 0){
    		fprint(2, "swapcontext failed: %r\n");
    		assert(0);
    	}
    }
But it looks like you're right, context.c is actually defining its own version of swapcontext to replace the system function that can use other backends. I'm sorry for the mistake, thanks for correcting me.

> are there cases it doesn't handle?

No, it conforms to the platform ABIs. I even back up the xmm6-15 registers that Microsoft made non-volatile on their amd64 ABI (compare to how much lighter the SystemV amd64 ABI's switch routine is; the pre-assembled versions are in the doc/ folder.)

Their own fibers library even has a switch to choose whether to back those up or not, which I think is quite dangerous.

Still, nothing can top SPARC's register windows for being outright hostile to the idea of context switching.

...

Oh, and it also supports thread-local storage for safe use with multiple real threads.

> it probably not going to be 100x faster.

No, definitely not compared to his ASM versions. Even with a less efficient swap, once you get past the syscall, most of the overhead is simply in the cache impact of swapping out the stack pointer, so his will likely be very close. In fact, even I had to sacrifice a tiny bit of speed to wrap the fastcall calling convention and execute non-inline assembly code (to avoid dependencies on specific compilers/assemblers.)

I do still strongly favor jmpbuf over ucontext for the final fallback, but with x86, ppc, arm, mips and sparc on Windows, OS X, Linux and BSD, you've pretty much covered 99.999% of hardware you'd ever use this on. That and libtask lacks Windows and OS X support.