Hacker News new | ask | show | jobs
by mistercow 3726 days ago
I've been confused for some time as to why people get excited about green threads. From what I've read, the main advantage seems to be that you can have threads on hardware that doesn't support threads natively, which is cool if you're on that kind of hardware. There's also some spin-up advantages I guess? But they don't get load-balanced across cores, right?

I feel like I'm missing something important.

6 comments

Green threads aren't just a substitute for when you don't have system level threads. Instead, they're a way of structuring code that allows you to express highly concurrent programs without requiring the heavy overhead of launching and switching between operating system threads.

Linux switches between threads at some frequency, I think it used to be 100 Hz. It involves swapping out the process registers, doing some kernel bookkeeping, etc—this is called a "context switch" and it's quite costly. Also, Linux threads allocate at least one memory page (4 KB) for the stack. [If I'm wrong about these details, please correct me!]

Basically, the cost associated with an operating system thread comes from the fact that it has to be isolated from other system threads on a low level... whereas language runtimes that offer green threads impose their own safety via language construction, e.g., Erlang processes can't reach in and mess with other processes memory (without C hacks).

So green threads can be much more efficient, but they require some care in the implementation, especially to support I/O, and to have fair and efficient load balancing, etc. Then you run N operating system threads to get balancing across cores, and distribute green thread work.

The advantage is you can have many more green threads than you could have OS threads (hundreds of thousands vs thousands), due to green threads being more lightweight. This allows a programming model based on message passing between green threads, which many people consider nicer to reason about. E.g. for a web server you could have a goroutine/Erlang process for each client with 50k clients and still have excellent performance, whereas if you had 50k OS threads you'd likely suffer performance issues and use a heap more ram.
And this is a huge benefit in the ease of coding.

If you only have two or four or sixteen threads you write overtly threaded code. But when you have millions you don't. Your program looks single-threaded and yet operates better and more safely.

One advantage is spinning up new green threads can be very quick. Starting a new Kernel thread requires at least 1 syscall.

For example: on a network service, you have 1 thread listening for new connections, when a new connection is made. It starts a new thread, which calls the handler. The listener thread then goes back to listening for new connections.

Now the advantages can depend on your green threading implementation. If a listener thread blocks on reading from Disk or a DB. Then the listener thread can still wait for new connections and other connection handlers can still operate. Making you network application responsive, without increasing latency on clone syscalls.

Of course you can achieve this in other ways.

First, you can run TONS of them, which is an enabler for program designs that native threads doesn't work well with.

Second, they are much lighter on memory (well, comes with the first point, but still).

Third, the supervising VM/environment has more fine-grained control over them than with native threads.

And in any decent implementation, they are absolutely load balanced across cores, why wouldn't they be?

Specific to Erlang processes -

As others have indicated, they're extremely lightweight, cheap to create and throw away, and are load balanced across cores.

But also, each has isolated memory, with share nothing semantics (with a couple caveats) which means that an exception in one won't affect others -unless you want it to-. That's huge.

But as has also been mentioned, you can create many of them. Someone else threw out 50k; nevermind that, try a million of them on a single box. That kind of concurrency opens up an entirely new paradigm of coding. One that is actually very useful, because it turns out, a lot of problems are naturally concurrent problems, that we've been trained to think about in sequential patterns because of how hard concurrency is.

An example I like to give is from the real world - task scheduling. We had to write some simple task scheduling for an application. Each task was multiple steps, many of which were time based (i.e., "execute this command, wait X amount of time, execute another command, wait Y amount of time, execute a third command, once that is successful execute a fourth command"). The traditional way of doing this would be some sort of priority queue, with tasks weighted by how long from now until they were to be done. You check how long until the next event, sleep until then, fire it, then repeat. Simple, right?

Except...each event leads to more events. And event timings can change. And events can happen simultaneously, so you actually need a pool of threads to actually execute the events on. Locks everywhere. Task logic is very hard to isolate from the execution logic (i.e., the bit that says "do X, and create an event to execute at time Y" is hard to keep entirely separate from the "pull event from priority queue, throw to a new thread to execute on, sleep until next event", since there are so many interactions between the two that can affect one another, changing when events happen, and when the queue puller needs to wake up).

In Erlang though? Trivial. Write your entire task as a single job. I.e., do x, wait, do y, wait, do z, wait for a message that z has completed, do a, etc. Then spin up one of those for each task that you need and let the VM handle the concurrency aspects of it. Even additional complexity, like "in the event of a message, change the amount of time until the next event to be half of what it was" is trivial; it's all contained in the same module, it all describes the same lifecycle of a single task. All the concurrency, the running of many of those tasks, and their interactions, and ensuring none is blocked, etc, is -free-.

This sounds like an obvious, ideal example once explained, and yet, every person where I worked who was unfamiliar with Erlang (and even some of those who had coded a little in it, but hadn't come to grasp the paradigm as well), who was explained what we were trying to do, described it as "easy, we just need to use a priority queue and pull from it!"

I think it's not as much green threads as very many threads. And then they're not as much threads as independently threadable parts of the code.

You don't have to intentionally write a work queue and balance the number of readers vs writers, etc. You just let the runtime make things go as fast as possible.

They do get balanced across cores in almost all cases. Erlang is a functional language which really lends itself to this.