Make a ordered_bag ets table and try to insert several thousand records in one ets call. It will block whole VM (spinning one core at 100%, doesn't matter how many you have) for several seconds. It does this because ordered bag needs to find each key in table as it inserts it, resulting in n^2 complexity and this needs to be done atomically. So for 1 000 keys you have 1 000 000 comparisons during a VM-wide lock. Solution - don't insert so many records in one call into ets table.
I’m more curious to understand how userspace threads can be preemptive. :)
I can think of a few ways:
- The VM just doesn’t JIT and can decide to stop executing a thread by just not interpreting the next piece of bytecode and switching to another green thread instead (this would be pretty slow due to the lack of JIT)
- The VM JITs, but inserts a preamble before every function call saying “Before executing this function, should I switch to another green thread first?”, and thus, so long as you call functions frequently enough, you “preempt” yourself. This is how Go does it, and it’s a well known thing in Go that if you never call a function for a while (like just doing a really huge for loop), the current goroutine doesn’t yield execution and hogs the whole OS thread.
> - The VM JITs, but inserts a preamble before every function call saying “Before executing this function, should I switch to another green thread first?”, and thus, so long as you call functions frequently enough, you “preempt” yourself. This is how Go does it, and it’s a well known thing in Go that if you never call a function for a while (like just doing a really huge for loop), the current goroutine doesn’t yield execution and hogs the whole OS thread.
I'm not sure of the implementation details, but this hasn't been true for a while in Go. As of Go 1.14, goroutines are asynchronously preemptible, so loops without function calls no longer deadlock the scheduler or GC: https://go.dev/doc/go1.14#runtime
The docs there hint at how it’s done in go and how it could be done in erlang: the runtime monitors how long a given goroutine has been running without yielding the scheduler, and uses a signal handler to interrupt code that has exceeded a 10ms quota of continuous usage.
In principle you can also set a timer that raises a signal and then switch to a different coroutine when it fires. But it is a big can of worms and won't perform great.
IIRC the Erlang VM does the second (schedule on function call). Since Erlang is a functional language and loops are done via recursion, this works out fine.
Yes, on every function call, also some internal functions implemented in native code are instrumented with those checks. On average thread is switched out after about 2000 "reductions" as they are called. Also there's an optimization, where if you send something to another thread and you are waiting for reply, VM switches instantly to the other thread, which makes some message sending equivalent to a simple function call.