Hacker News new | ask | show | jobs
by coffeemug 4881 days ago
This is a great question. We start a thread per core, and multiplex thousands of coroutines/events on each thread. When coroutines on different threads need to communicate, we send a message via a highly optimized message bus, so cross-thread communication code is localized. This means each thread is lock-free (i.e. when a coroutine needs to communicate with another coroutine, it sends a message and yields, so the CPU core can process other pending tasks). The code isn't wait-free -- a coroutine might have to wait, but it never ever locks the CPU core itself. So, as long as there is more work to do, the CPU will always be able to do it.

If instead we used threads + locking like traditional systems, we'd have to deal with "hot locks" that block out entire cores. Effectively we solved this problem once and for all, while systems that use threads + locks (like the linux kernel) have to continuously solve it by making sure locks are extremely granular.

1 comments

Sounds very Erlang-ish. Did you copy that deliberately?
We do effectively have an ad-hoc mini Erlang runtime that we wrote at the core of the system. I'm not sure how deliberate that was -- we sort of borrowed performance ideas from many places, tried a lot of different approaches, and settled on this one. Lots of this was definitely inspired by ideas from Erlang.
There definitely seems to be a version of Greenspun's Tenth Rule for Erlang. But I think Greenspunning has gotten too bad a name – sometimes implementing a subset of a classical system is exactly what you ought to do, for example when your problem allows you to exploit certain invariants that don't hold in the general case, or for some reason using the classical system itself (Erlang in this case) is not an option.
Right! Rethink has an adhoc Erlang runtime for message processing, and an adhoc lisp for the query language. I'm both ashamed and proud of this at the same time :)