Submitted title was "Java virtual threads caused a deadlock in TPC-C for PostgreSQL". We've reverted it now to the article's own title (truncated to fit HN's 80 char limit).
I would like to reply with a quote of sir Tony Hoare's 1980 ACM Turing Award Lecture: "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies".
This one was a very large caveat pointed out loudly every time the feature was mentioned to the community. So, yes the limitation/flaw was very well known and is ideally going to be addressed in a future JDK release.
> This one was a very large caveat pointed out loudly every time the feature was mentioned to the community.
It really wasn't. There were people on here, including Oracle employees, claiming that the virtual thread implementation was a drop-in replacement that would work (not necessarily perform better, but work) in all cases.
And it indeed does in common usage scenarios. And also in this case once the issue with `synchronized` is resolved. After all, this is a benchmark and it's not surprising that one of the limitations of the design was hit.
I don't know if those Oracle employees actually did outright say -- or even imply -- "in all cases" as the GP asserted, but if they did, then "only" working in "common usage scenarios" would definitely be overselling the feature.
I find it unlikely as well that they said it would just work in all cases. But since it's going to work out eventually and a workaround exists, they would actually not be that wrong with that statement.
If your program requires 5 OS threads to be able to make progress (say, there are operations for which 4 threads are blocked while the 5th does work), and the runtime only spawns 2 OS threads and tries to schedule your 5 now virtual threads on those 2 OS threads, then your old program was perfectly correct and the virtual threads runtime has broken it.
This was a known and advertised failure case with the new threads runtime, exacerbated by limitations in the implementation that cause certain blocking operations to block the current OS thread instead of blocking the virtual thread and allowing another virtual thread to be re-use the underlying OS thread.
Having X resources and Y dedicated threads that operate on them, where Y > X, and allowing a thread to block in a way that requires the assistance of another thread to make progress but only when it's holding a resource, is a perfectly reasonable, standard, and safe design. When a change to the runtime silently reduces the number of synchronized sections a program can enter concurrently, it's not at all surprising that this breaks working code.
Article won’t load for me. My understanding was that the switch to virtual was supposed to be relatively simple, allowing the programmer to be naive. But a user deadlock is a user deadlock, no matter the threading impl.
The issue here is currently virtual threads don't work well with the 'sychronized' keyword. Right now synchronized will pin the carrier thread. The fix was to switch to a higher-level abstraction that works with virtual threads.
My understanding is there is work to make synchronized not pin the carrier thread, but that's some pretty complex and important code to change.
From a relatively brief skim and past Go and Java experience: synchronized blocks the current normal thread, so that doesn't really seem any different to me. If you starve your threads, you starve your threads.
It definitely leaves room to optimize by not pinning that thread, which would be great, but that shouldn't change semantics at all. Or is there something actually screwed up in the implementation of virtual threads that makes this a much bigger issue?
It’s a thread that supports n virtual threads. You want synchronized in virtual thread a and not the carrier thread which will block all the virtual threads.
Been away from Java land for a while. How did something like that even get into release? That’s like a pretty big loaded shotgun to leave lying around with lots of kids playing, no?
It's an explicitly documented shortcoming of the existing implementation that will be fixed soon. I knew immediately from the title of TA what probably happened. The other similar limitations (CPU-bound tasks, native calls) seem much more severe, but are ultimately unsolvable. Meanwhile, the issue with synchronized is regarded as a scalability bottleneck since the JDK is supposed to temporarily spawn additional platform threads. This behavior can be controlled via the system property `jdk.virtualThreadScheduler.maxPoolSize`.
Also, this is a benchmark. It's not surprising that they managed to produce a situation where more than n_cores virtual threads would actually start waiting.
Appreciate the reply. Hope you get it out soon since many /g do not read documentation and synchronized semantics changing is a ‘surprise’, specially since this one of those bugs that is a nursery for heisenbugs.
It spawns a new carrier thread in place, up to a certain, configurable limit. But starving carrier threads will also result in effectively live locks, so that’s not a solution.
So I don’t see the big fuss about it - don’t spawn a million virtual thread that all just spams synchronized?
Java's virtual threads are supposed to be a drop-in replacement for real threads. But using virtual threads means you get a far smaller number of real threads, and things that were safe back when you had an unlimited number of real threads available (or at least, a larger number than your database connection pool) are no longer safe.
Shouldn't you be able to use the same number of real threads though, plus some additional effectively-threads for the virtual threads that are not pinned? Doesn't seem like this should change semantics there, so the risk would be code that changes because of perceived advantages which are not true in edge cases - that's new behavior that wasn't possible before, there aren't really any existing semantics to break.
If they're, like, limiting to CPU cores * 2 threads: yeah that would be Bad™. Unambiguously. I haven't been able to find anything conclusive about this though.
In this case the synchronized blocks are released by calls to Object.wait, so that code would not deadlock with normal threads.
The issue is that Object.wait doesn't suspend virtual threads, so you get deadlocks. The answer is to reimplement usages of the wait/notify pattern to use locks or concurrent collections (for example, using a concurrent message queue for the producer/consumer pattern, which is a common use case for synchronized/wait/notify).
This has nothing to do with postgres though. It’s part of a generic JDBC connection pool:
> The problem is that this synchronized code might be deeply embedded within the libraries you use. In our case, it was within the c3p0 library. So, the fix is straightforward: we simply wrapped the connection with a java.util.concurrent.Semaphore.
I bet if you just checked out connections and slept a random amount of time you’d have the same problem.
c3p0 appears to have been working under the assumption that their threads can always run when whatever they’re waiting on is done. Their only dependency is the OS giving them cycles.
Virtual threads changed the contract a little bit. Now one virtual thread running certain code can prevent a different virtual thread from ever getting any cycles even though they are not dependent on each other in the Java code. It’s a side effect of the current Java implementation.
The rules changed, and it tripped up c3p0. Unless they explicitly said somewhere that they were completely ready for virtual threads I’m not sure anyone is at fault here.
"Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html