Hacker News new | ask | show | jobs
by kevingadd 697 days ago
One of the main reasons to do virtual threads is that it allows you to write naive "thread per request" code and still scale up significantly without hitting the kind of scaling limits you would with OS threads.
1 comments

The problem with the naïve design is that even with virtual threads, you risk running out of (heap) memory if the threads ever block. Each task makes a bit of progress, allocates some objects, and then lets another one do the same thing.

With virtual threads, you can limit the damage by using a semaphore, but you still need to tune the size. This isn't much different than sizing a traditional thread pool, and so I'm not sure what benefit virtual threads will really have in practice. You're swapping one config for another.

> The problem with the naïve design is that even with virtual threads, you risk running out of (heap) memory if the threads ever block.

The key with virtual threads is they are so light weight that you can have thousands of them running concurrently: even when they block for I/O, it doesn't matter. It's similar to light weight coroutine in other language like Go or Kotlin.

What you are complaining about has nothing to do with thread pools or virtual threads. You're pointing out the fact that more parallelism will also need more hardware and that a finite hardware budget will need a back pressure strategy to keep resource consumption within a limit. While you might be correct that "sizing a traditional thread pool" is a back pressure strategy that can be applied to virtual threads, the problem with it is that IO bound threads will prevent CPU bound threads from making progress. You don't want to apply back pressure based on the number of tasks. You want back pressure to be in response to resource utilization, so that enough tasks get scheduled to max out the hardware.

This is a common problem with people using Java parallel streams, because they by default share a single global thread pool and the way to use your own thread pool is also extremely counterintuitive, because it essentially relies on some implicit thread local magic to choose to distribute the stream in the thread pool that the parallel stream was launched on, instead of passing it as a parameter.

It would be best if people came up with more dynamic back pressure strategies, because this is a more general problem that goes way beyond thread pools. In fact, one of the key problems of automatic parallelization is deciding at what point there is too much parallelization.

The benefits from virtual threads come from the simple API that it presents to the programmer. It's not a performance optimization.
But that same benefit was always available with platform threads -- a simple API. What is the real gain by using virtual threads? It's either going to be performance or memory utilization.
It's combining the benefits from async models (state machines separated from os threads, thus more optimal for I/O bound workload), with the benefits from proper threading models (namely the simpler human interface).

Memory utilization & performance is going to be similar to the async callback mess.

Why is an async model better than using OS threads for an I/O bound workload? The OS is doing async stuff internally and shielding the complexity with threads. With virtual threads this work has shifted to the JVM. Can the JVM do threads better than the OS?
"Why is an async model better than using OS threads for an I/O bound workload?"

Because evented/callback-driven code is a nightmare to reason about and breaks lots of very basic tools, like the humble stack trace.

Another big thing for me is resource management - try/finally don't work across callback boundaries, but do work within a virtual thread. I recently ported a netty-based evented system to virtual threads and a very long-standing issue - resource leakage - turned into one very nice try/finally block.

> Can the JVM do threads better than the OS?

Yes. The JVM has far more opportunities for optimizing threads because it doesn't need to uphold 50 years of accumulated invariants and compatibility that current OSes do, and JVM has more visibilty on the application internals.

it can do a much better job because there isn't a security boundary. OS thread scheduling requires sys calls and invalidate a bunch of cache to prevent timing leaks
Create 100k platform threads and you'll find out.
Throughput. The code can be "suspended" on a blocking call (I/O, where the platform thread usually is wasted, as the CPU has nothing to do during this time). So, the platform thread can do other work in the meantime.
Yeah, and it's generally good to be RAM limited instead of CPU, no? The alternative is blowing a bunch of time on syscalls and OS scheduler overhead.

Also the virtual threads run on a "traditional" thread pool to my understanding, so you can just tweak the number of worker threads to cap the total concurrency.

The benefit is it's overall more efficient (in the general case) and lets you write linear blocking code (as opposed to function coloring). You don't have to use it, but it's nice that it's there. Now hopefully Valhalla actually makes it in eventually

The OS scheduler is still there (for the carrier threads), but now you've added on top of that FJ pool based scheduler overhead. Although virtual threads don't have the syscall overhead when they block, there's a new cost caused by allocating the internal continuation object, and copying state into it. This puts more pressure on the garbage collector. Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.

I've not yet seen a study that shows that virtual threads offer a huge benefit. The Open Liberty study suggests that they're worse than the existing platform threads.

> The OS scheduler is still there (for the carrier threads), but now you've added on top of that FJ pool based scheduler overhead.

Ideally carrier threads would be pinned to isolated cpu cores, which removes most aspects of OS scheduler from the picture

> I've not yet seen a study that shows that virtual threads offer a huge benefit.

Not exactly Java virtual threads, but a study on how userland threads beat kernel threads.

https://cs.uwaterloo.ca/~mkarsten/papers/sigmetrics2020.html

For quick results, check figures 11 and 15 from the (preprint) paper. Userland threads ("fred") have ~50% higher throughput while having orders of magnitude better latency at high load levels, in a real-world application (memcached).

The study says there's surprising performance problems with Java's virtual thread implementation. Their test of throughput was also hilarious, they put 2000 OS threads vs 2000 virtual threads: most of the time OS threads don't start falling apart until 100k+ threads. You can architect an application such that you can handle 200k simultaneous connections using platform-thread-per-core, but it's harder to reason about than the linear, blocking code that virtual threads and async allow for.

> Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.

Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.

> there's a new cost caused by allocating the internal continuation object, and copying state into it.

This is more of a problem with the implementation (not every virtual thread language does it this way), but yeah this is more overhead on the application. I assume there's improvements that can be made to ease GC pressure, like using object pools.

Usually virtual threads are a memory vs CPU tradeoff that you typically use in massively concurrent IO-bound applications. Total throughput should take over platform threads with hundreds of thousands of connections, but below that probably perform worse, I'm not that surprised by that.

> Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.

Java virtual threads are stackful; they have to save and restore the stack every time they mount a different virtual thread to the platform thread. They do this by naive[0] copying of the stack out to a heap allocation and then back again, every time. That's clearly a context switch that you're paying for; it's just not in the kernel. I believe this is what the person you're replying to is talking about.

[0] Not totally naive. They do take some effort to copy only subsets of the stack if they can get away with it. But it's still all done by copies. I don't know enough to understand why they need to copy and can't just swap stack pointers. I think it's related to the need to dynamically grow the stack when the thread is active vs. having a fixed size heap allocation to store the stack copy.

Async does exactly the same by the way.