Hacker News new | ask | show | jobs
by pron 697 days ago
Virtual threads do one thing: they allow creating lots of threads. This helps throughput due to Little's law [1]. But because this server here saturates the CPU with only a few threads (it doesn't do the fanout modern servers tend to do), this means that no significant improvements can be provided by virtual threads (or asynchronous programming, which operates on the same principle) while keeping everything else in the system the same, especially since everything else in that server was optimised for over two decades under the constraints of expensive threads (such as the deployment strategy to many small instances with little CPU).

So it looks like their goal was: try adopting a new technology without changing any of the aspects designed for an old technology and optimised around it.

[1]: https://youtu.be/07V08SB1l8c

3 comments

It goes deeper than Little's Law. Every decent textbook on introductory queuing theory has the result that on a normalized basis, fast server > multi-server > multi-queue. That analysis admits almost arbitrary levels of depth of analysis and still holds true.

Your observation that computing architectures have chased fast server for decades is apt. There's a truism in computing that those who build systems are doomed to relearn the lessons of the early ages of networks, whether they studied them in school or not. But kudos to whoever went through the exercise again.

I guess at least their work has confirmed what we probably already knew intuitively: if you have CPU-intensive tasks, without waiting on anything, and you want to execute these concurrently, use traditional threads.

The advice "don't use virtual threads for that, it will be inefficient" really does need some evidence.

Mildly infuriating though that people may read this and think that somehow the JVM has problems in its virtual thread implementation. I admit their 'Unexpected findings' section is very useful work, but the moral of this story is: don't use virtual threads for this that they were not intended for. Use them when you want a very large number of processes executing concurrently, those processes have idle stages, and you want a simpler model to program with than other kinds of async.

I'll put it this way: to benefit from virtual threads (or, indeed, from any kind of change to scheduling, such as with asynchronous code) you clearly need 1. some free computational resources and 2. lots of concurrent tasks. The server here could perhaps have both with some changes to its deployment and coding style, but as it was tested -- it had neither. I'm not sure what they were hoping to achieve.
This take sounds reasonable to me. But I'm not an expert, and I'd be curious to hear an opposing view if there's one.
Standard/OS threads in Java use about a megabyte of memory per thread, so running 256 threads uses about 256 MB of memory before you've even started allocating things on the heap.

Virtual threads are therefore useful if you're writing something like a proxy server, where you want to allow lots of concurrent connections, and you want to use the familiar thread-per-connection programming model.

Only address space of 1 MB is reserved (which can still be a problem), actual memory usage is limited to the memory pages that are actually accessed by the program within that address space.
He is as much of an expert as it gets, as he is the leader of the Loom project.
Greenlets ultimately have to be scheduled onto system threads at the end of the day unless you have a lightweight thread model of some sort supported by the OS, so it’s a little bit misleading depending on how far down the stack you want to think about optimizing for greenlets. You could potentially have a poor implementation of task scheduling for some legacy compatibility reason, however. I guess I’d be curious about the specifics of what pron is discussing.
Even though yes, in the end you have to map onto system threads, there are still quite a fee things you can do. But this is infeasible for Java, unfortunately.

For example, in Erlang the entire VM is built around green threads with a huge amount of guarantees and mechanisms: https://news.ycombinator.com/item?id=40989995

When your entire system is optimized for green threads, the question of "it still needs to map onto OS threads" loses its significance

I really don’t think it’s useful to be this nonspecific. You could give an example of what a Java greenlet cannot do or how it cannot be optimized, for example. If your whole point is actually just “I prefer the semantics of BEAM threads”, then just say that.
Those semantics are exactly what cannot be done in Java for many reasons (including legacy code etc.).

And yes, those semantics are important, but sadly most people stop at "yay we have green threads now" and then a null pointer exception kills their entire app, or the thread that handles requests, or...

So let’s be clear, your point is that you find the API of non-BEAM greenlets less useful, not that they’re somehow necessarily less efficient. Right?
> When your entire system is optimized for green threads, the question of "it still needs to map onto OS threads" loses its significance

How's that? What about parallelism?