| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by packetlost 742 days ago

The study says there's surprising performance problems with Java's virtual thread implementation. Their test of throughput was also hilarious, they put 2000 OS threads vs 2000 virtual threads: most of the time OS threads don't start falling apart until 100k+ threads. You can architect an application such that you can handle 200k simultaneous connections using platform-thread-per-core, but it's harder to reason about than the linear, blocking code that virtual threads and async allow for.

> Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.

Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.

> there's a new cost caused by allocating the internal continuation object, and copying state into it.

This is more of a problem with the implementation (not every virtual thread language does it this way), but yeah this is more overhead on the application. I assume there's improvements that can be made to ease GC pressure, like using object pools.

Usually virtual threads are a memory vs CPU tradeoff that you typically use in massively concurrent IO-bound applications. Total throughput should take over platform threads with hundreds of thousands of connections, but below that probably perform worse, I'm not that surprised by that.

1 comments

electroly 742 days ago

> Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.

Java virtual threads are stackful; they have to save and restore the stack every time they mount a different virtual thread to the platform thread. They do this by naive[0] copying of the stack out to a heap allocation and then back again, every time. That's clearly a context switch that you're paying for; it's just not in the kernel. I believe this is what the person you're replying to is talking about.

[0] Not totally naive. They do take some effort to copy only subsets of the stack if they can get away with it. But it's still all done by copies. I don't know enough to understand why they need to copy and can't just swap stack pointers. I think it's related to the need to dynamically grow the stack when the thread is active vs. having a fixed size heap allocation to store the stack copy.