|
|
|
|
|
by crawshaw
3154 days ago
|
|
It's not nearly as simple as you claim. First: if you have an epoll loop it is also the cost of the thread context switch, which has definitely us in RPC systems using kernel threads. By contrast the goroutine gets scheduled onto the kernel thread that answered the poll, saving the switch. Second: as I alluded to earlier, linux and solaris can scale their kernel thread implementations, not all OSs can. My experiences with large numbers of threads on the BSDs and Windows (in years past admittedly) suggest other kernels don't have thread implementations designed to scale to such high numbers. Solving the problem in userspace means Go programs written in this style are portable across operating systems. Third: you can only adjust stack sizes down if you know your program always keeps its stacks small. If you depend on libraries you don't own in C/C++, that's a difficult assumption. Go grows the stacks, so if you hit some corner case where a small number of goroutines need some significant amount of stack, your program uses more memory, but typically keeps working. No need for careful (manual!) stack accounting. If all this were as easy as you say, we would still write nearly all our C/C++ servers using threads. We don't because it's not. |
|
I'm not comparing M:N to a 1:1 system where all I/O is proxied out to another thread sitting in an epoll loop. I'm comparing M:N to 1:1 with blocking I/O. In this scenario, the kernel switches directly onto the appropriate thread.
> Second: as I alluded to earlier, linux and solaris can scale their kernel thread implementations, not all OSs can.
The vast majority of Go users are running Linux. And on Windows, UMS is 1:1 and is the preferred way to do high-performance servers; it avoids a lot of the problems that Go has (for instance, playing nicely with third-party code).
> Third: you can only adjust stack sizes down if you know your program always keeps its stacks small.
You could do 1:1 with stack growth just as Go does. As I've said before, small stacks are a property of the relocatable GC, not a property of the thread implementation.
> If all this were as easy as you say, we would still write nearly all our C/C++ servers using threads.
We don't write C/C++ servers using threads because (1) stackless use of epoll is faster than both 1:1 threading and M:N threading, as this project shows; (2) C/C++ can't do relocatable stacks, as the language is hostile to precise moving GC.