|
Stack switching isn't that much of the overhead. "ordinary" cgo overhead is <100ns now, has been for a few years, and is much closer to 30 than 80 on recent processors. Most of the overhead is a set of 4 CAS operations (incidentally this means that AMD has measurably lower cgo overhead because of something with its caching model I don't understand). If cgo's only overhead was the "ordinary" overhead, most people wouldn't have an issue with it. It's downright zippy, in fact... as long as your syscall/C call takes less than 1us. If you stay under the 1us threshold, go will put the OS thread used for the syscall back where it found it and everything moves on. The issue is that the OS thread was previously serving N goroutines that other parts of the program may be waiting on to move forward, and the OS thread is in a state where go can't pre-empt it and allow those other goroutines to move forward, and it has no idea how long it will be until it can move forward. As a result, if a syscall/c call takes longer than 1us, go has no choice at this time but to resume a new thread, context switch all the old work onto that thread, and then suspend the syscall thread when it comes back. If you do this a lot, your performance will crater. There's also separately a few issues around how go chooses to resume/suspend OS threads (for instance, if an os-locked goroutine does coooperative park for any reason to wait on another thread to do something, go will suspend the thread it was on, context switch to a different thread, then when the goroutine wakes up, it will realize its mistake, resume the thread it was on and context switch again). This is all fixable stuff, but all the use cases that google cares about are working fine so it doesn't really get any attention. |