Hacker News new | ask | show | jobs
by masklinn 1478 days ago
> And indeed, Go's scheduler is cooperative.

It hasn't been cooperative for a few versions now, the scheduler became preemptive in 1.14. And before that there were yield points at every function prolog (as well as all IO primitives) so there were relatively few situations where cooperation was necessary.

> Without knowing too much assembly, I would assume any modern processor would make a context switch a one instruction affair.

Any context switch (to the kernel) is expensive, and way more than a single operation. The kernel also has a ton of stuff to do, it's not just "picks the thread to run", you have to restore the ip and sp, but also may have to restore FPU/SSE/AVX state (AVX512 is over 2KB of state), traps state.

Kernel-level context switching costs on the order of 10x what userland context switching does: https://eli.thegreenplace.net/2018/measuring-context-switchi...

> LOAD THREAD

There is no load thread instruction

1 comments

> It hasn't been cooperative for a few versions now, the scheduler became preemptive in 1.14. And before that there were yield points at every function prolog (as well as all IO primitives) so there were relatively few situations where cooperation was necessary.

Since co-op was most unnecessary, do you know why it was changed to preemptive or what the specific cases were that are resolved with preemptive scheduling?

Tight loops without function calls.
IIRC in earlier versions, an infinite loop without function calls could freeze the entire runtime: GC's stop the world event is triggered => goroutines are being suspended cooperatively => but this one goroutine never enters a function prolog and thus never suspends => all goroutines except one are suspended and there's no progress. Preemptive scheduling is much more robust. Although it's solvable in cooperative scheduling with an additional suspension check at the end of each loop, but it adds overhead for all loops. If I remember correctly, .NET or JVM implement safe points for GC (which can be used to switch contexts cooperatively as well) by simply reading a pointer from a special preallocated virtual memory page which is remapped to nothing when a stop-the-world event is triggered, so such a read traps into an invalid memory handler where you can park your thread. But I'm not sure how costly it is for thousands/millions of coroutines.
> But I'm not sure how costly it is for thousands/millions of coroutines.

Still cheap: you only need to preempt the threads which are actively running user code. If a coroutine is ready to run, but not actually running, you don't have to do anything with it (as long as you check for safepoints before entering user code.) That means your safepoints cost is `O(os threads currently running user code)` which in most runtimes is `O(num cores)`

Replace “tight” with “long-running/infinite” but yeah, otherwise this is correct.