| Good stuff. Erlang VM FTW! > mochiglobal, a module that exploits a feature of the VM: if Erlang sees a function that always returns the same constant data, it puts that data into a read-only shared heap that processes can access without copying the data There is a nice new OTP 20.0 optimization - now the value doesn't get copied even on message sends on the local node. Jesper L. Andersen (jlouis) talked about it in his blog: https://medium.com/@jlouis666/an-erlang-otp-20-0-optimizatio... > After some research we stumbled upon :ets.update_counter/4 Might not help in this case but 20.0 adds select_replace so can do a full on CAS (compare and exchange) pattern http://erlang.org/doc/man/ets.html#select_replace-2 . So something like acquiring a lock would be much easier to do. > We found that the wall clock time of a single send/2 call could range from 30μs to 70us due to Erlang de-scheduling the calling process. There are few tricks the VM uses there and it's pretty configurable. For example sending to a process with a long message queue will add a bit of a backpressure to the sender and un-schedule them. There are tons of configuration settings for the scheduler. There is to bind scheduler to physical cores to reduce the chance of scheduler threads jumping around between cores: http://erlang.org/doc/man/erl.html#+sbt Sometimes it helps sometimes it doesn't. Another general trick is to build the VM with the lcnt feature. This will add performance counters for locks / semaphores in the VM. So then can check for the hotspots and know where to optimize: http://erlang.org/doc/man/lcnt.html |
if send/2 takes 30us to 70us, I'm guessing blocking as well, either on distributed communication or something else along those lines. For local message passes to take that long, my something-is-amiss-sixth-sense is tingling.