Hacker News new | ask | show | jobs
by ye-olde-sysrq 998 days ago
Excellent! With virtual threads, all the blocking code I wrote is efficient now :)

Less-joking: I'm so excited for this to start getting adoption. For a pre-release/preview feature, Loom already has SO much traction, with most major frameworks already having support for running app/user code on virtual threads and one notable project (Helidon Nima) replacing netty with their own solution based on blocking IO and virtual threads. Now I want to see the community run with it.

I've always thought async IO was just plain gross.

Python's implementation is so yucky that after using it for one project I decided that I'd rather DIY it with multiprocessing than use async again. (I don't have any more constructive feedback than that, my apologies, it was a while ago so I don't remember the specifics but what has lasted is the sour taste of the sum of all the problems we had with it - perhaps notably that only about 2 people on my dev team of 5 actually understood the async paradigm).

netty did it fine. I've built multiple projects on top of netty and it's fine. I like event-based-async more than async-await-based-async. But it's still a headache and notably I really rather missed the kinds of guarantees you can get (in blocking code) by wrapping the block in try-catch-finally (to e.g. guarantee resources get freed or that two counters, say a requests-in and a requests-out, are guaranteed to match up).

But dang am I excited to not do that anymore. I have one specific project that I'm planning to port from async to blocking+virtualthreads that I expect to greatly simplify the code. It has a lot of requests it makes back and forth (it has to manually resolve DNS queries among other things) so there's good chunks of 50-200 ms where I have to either yield (and has gross async code that yields and resumes all the heck over the place) or block the thread for human-noticeable chunks of time (also very gross of course!).

7 comments

Funny I think async in Python is a lot of fun for side projects but my experience is that if I hand my Python systems off to other people they usually have trouble deploying them and invariably can’t maintain them.

Whereas my Java projects live on long after I am gone from the project.

Personally I love aiohttp web servers, particularly when using web sockets and brokering events from message queues and stuff like that. Not to mention doing cool stuff with coroutines and hacking the event queue (like what do you do if your GUI framework also has an event queue?) If YOShInOn (my smart RSS reader + intelligent agent) were going to become open source though I might just need to switch to Flask which would be less fun.

Async in Python, for a long time, has been a horrible hack relying on monkey patching the socket module. The newer asyncio stuff is quite nice by comparison, but the problem is that Python, due to its popularity, has libraries that haven't been upgraded.

Python always had deployment issues, IMO. In Java, 99% of all library dependencies are pure JARs, and you rarely need to depend on native libraries. You can also assemble an executable fat JAR which will work everywhere, and the fact that the build tools are better (e.g., Maven, Gradle) helps.

Compared with Python, for which even accessing a RDBMS was an exercise in frustration, requiring installing the right blobs and library headers via the OS's package manager, with Postgres being particularly painful. NOTE: I haven't deployed anything serious built with Python in a while, maybe things are better now, but it couldn't get much better, IMO.

I worked at a place where we had machine learning systems with a big pile of dependencies that pip could not consistently resolve, I figured out what most of the technical problems where but I was still struggling with wetware problems and they eventually put me on a Scala/Typescript project instead.

One big problem is that pip just starts downloading and installing things optimistically, it does not get a global view of the dependencies and if it finds a conflict it can't reliably back out from where it is and find a good configuration. The answer is to do what maven does or what conda does and download the dependency graph of all the matching versions and get a solve before before you start downloading. Towards the end of my time on that project I had built something that assembled a "wheelhouse" of wheels necessary to run my system and would install them directly.

What I figured out was that you could download just the dependencies from a wheel with 2 or 3 range requests because a wheel is just a ZIP file and you can download the header and the directory from the end of the file and then know where the metadata is and download just that. Recently pypi got some sense and now they let you download just the metadata.

And that's the story of Python packaging. Things are really going in the right direction but progress has been slow because the community has mistaken "98% correct" (e.g. wrong) with "has 98% of the features somebody might want" It might have been a lot better if somebody with some vision and no tolerance for ambiguity had gotten in charge a long time ago.

A new dependency resolver was introduced [0] with pip 20.3 (in 2020) which sounds like it's meant to address the problem you're describing. Were you using an earlier version of pip or is this still a problem?

[0] https://pip.pypa.io/en/latest/user_guide/#changes-to-the-pip...

> In Java, 99% of all library dependencies are pure JARs

Yes, this is the difference...Python community has in practice chosen more native dependencies, Java has not. But Java JNI code (if you ever do have it) is just as painful.

> with Postgres being particularly painful

You want to a pure Python package, and those have gotten much better.

asyncpg is really, really good if you want async.

Otherwise, pg8000.

It got better, with containers.
Where I worked containers just gave the data scientists superpowers at finding corrupted Python runtimes. I don't know where they got one that had a Hungarian default charset, but they did.
And it’s LTS! When is the book and the certification exam coming out, does anyone know?
Fully agree on Helidon Nima and blocking IO. Zero hope that Spring framework crapola will not smother already massively simplified thing with convenient abstractions on top it.
Debugging issues with Netty from someone not intimately familiar with it's internals was an exercise in pain.
Can you explain why you're excited about with virtual threads? I get that they improve throughput in extremely high pressure apps, but the JVM's current threading isn't exactly a slouch. Java's used in HFT shops, and more generally in fintech where performance matters.
The main problem is that it's not a matter of "speed" but just of congestion.

If you write a program using blocking IO and Platform (OS) threads, you're essentially limited to a couple hundred concurrent tasks, or however many threads your particular linux kernel + hardware setup can context switch between before latency starts suffering. So it's slow not because Java is slow, but because kernel threads are heavyweight and you can't just make a trillion of them just for them to be blocking waiting on IO.

If you use async approaches, your programming model suffers, but now you're multiplexing millions of tasks over a small number of platform threads of execution still without even straining the kernel's switching. You've essentially moved from kernel scheduling to user-mode scheduling by writing async code.

Virtual threads is a response to this, saying "what if you can eat your cake and have it, too?" by "simply" providing a user-mode-scheduled thread implementation. They took the general strategy that async programming was employing and "hoisted" it up a couple levels of abstraction, to be "behind" the threading model. Now you have all the benefits of just blocking the thread without all the problems that come from trying to have a ton of platform threads that will choke the linux kernel out.

> couple hundred concurrent tasks, however many threads your particular linux kernel + hardware setup can context switch between before latency starts suffering.

and there any benchmarks saying it is couple hundred threads, and not 100k threads?..

couple hundred is about thread per core on modern CPUs..

Also, my belief is that JVM itself adds lots of overhead.

>Also, my belief is that JVM itself adds lots of overhead.

The JVM has ~nothing to do with the scheduling of platform threads

>and there any benchmarks saying it is couple hundred threads, and not 100k threads?..

It depends greatly on your hardware

>couple hundred is about thread per core on modern CPUs..

It depends on the hardware, of course

Overall, yes, you could probably run a lot of concurrent processes on a c7i.48xlarge in 2023. But why would you want to do that when, if you used virtual threads (or async), you could do the same work on an c7i.large? That's the whole point of this. There's no reason to waste CPU and memory on kernel context switching overhead.

> The JVM has ~nothing to do with the scheduling of platform threads

I didn't say it schedules platform threads, but JVM has many other performance issues with its GC/object identity/lock monitoring model, which makes it harder choice for ultra-high performance apps, so virtual thread scheduling may improve your app perf by 5%, where other 95% stuck in other bottlenecks

> But why would you want to do that when, if you used virtual threads

because virtual threads are not supported in ecosystem well, and it will take another 10 years for it to catch up, before that most API/libs will use old concepts and you take large risks of fragmentation by introducing virtual threads to your app.

Also, you can avoid all switching with 20 years old ExecutorService, where you have exactly the same m:n mapping between tasks and machine threads.

>I didn't say it schedules platform threads, but JVM has many other performance issues with its GC/object identity/lock monitoring model, which makes it harder choice for ultra-high performance apps, so virtual thread scheduling may improve your app perf by 5%, where other 95% stuck in other bottlenecks

Switching from async to virtual threads is typically not an improvement to performance in well-factored async code. The primary benefit is a clearer programming model that's significantly easier to get right and less code to implement while still the same performance.

>Also, you can avoid all switching with 20 years old ExecutorService, where you have exactly the same m:n mapping between tasks and machine threads.

You misunderstand the point of virtual threads. There is no need to pool them, as (most) executor service implementations do. And the whole point is that you no longer need to avoid blocking them, which pretty much every currently-existing solution will be doing. There's 0 benefit to switching without also updating your code to take advantage of the new paradigm.

>because virtual threads are not supported in ecosystem well, and it will take another 10 years for it to catch up, before that most API/libs will use old concepts and you take large risks of fragmentation by introducing virtual threads to your app.

What risks? The whole point of virtual threads vs async/await is that they don't color functions in the same way that async does.

Also, pretty much every notable framework already has support for virtual threads (Spring, jetty, helidon, and many others). The API has been the same for a couple years at this point.

You can just swap a single line in your ExecutorService then for all the benefits.
I can’t speak for the OP, but this makes it much easier to write code that uses threads that wait on IO, and just let the underlying system (VM + JDBC connectors, for example) handle the dirty work.

A few years ago, I wrote a load generation application using Kotlin’s coroutines - in this case, each coroutine would be a “device”. And I could add interesting modeling on each device; I easily ran 250k simulated devices within a single process, and it took me a couple of days. But coroutines are not totally simple; any method that might call IO needs to be made “coroutine aware”. So the abstraction kinda leaks all over the place.

Now, you can do the same thing in Java. Just simply model each device as its own Runnable and poof, you can spin up a million of them. And there isn’t much existing code that has to be rewritten. Pretty slick.

So this isn’t really a “high performance computing” feature, but a “blue collar coder” thing.

It's worth mentioning that there are some aspects of the virtual thread implementation that is important to take into consideration before switching out the os-thread-per-task executor for the virtual-thread-per-task executor:

1. Use of synchronized keyword can pin the virtual thread to the carrier thread, resulting in the carrier thread being blocked and unable to drive any other virtual threads. Recommendation is to refactor to use ReentrantLock. This may be solved in the future though.

2. Use of ThreadLocals should be reduced/avoided since they're local to the virtual thread and not the carrier threads, which could result in balooning memory usage with extensive usage from many virtual threads.

Regarding the second point: there are now alternatives to ThreadLocals available that are intended to be used by virtual threads: Scoped Values. Unfortunately, they are preview-only in JDK 21.
Pre-v21, Java's threads were 1:1 based on OS threads. HFT apps were normally single-threaded with exotic work-dispatching frameworks, native-memory buffers &| built-in network stacks so, while being undoubtedly fast, were not particularly representative.

V21 virtual threads are more like Go's goroutines. They map 1:m with OS threads, and the JVM is responsible for scheduling them, making them much less of a burden on the underlying OS, with fewer context switches, etc. And the best thing is, there has been minimal change in the Java standard library API, making them very accessible to existing devs and their codebases.

Once upon a time, Java had "green threads". They were kind of like virtual threads, but the way I understood it was that they all mapped to one OS thread. While these new virtual threads map m:n to OS threads.
An interesting part of HFT is you normally do everything on a single thread. Your unit of parallelism would be a different JVM and you want to avoid context switching at all costs, going as far as to pin specific OS threads, making sure your thread for execution is never used by GC.
To understand the benefit of virtual threads, I think it's helpful to think of it as a "best of both worlds" situation between blocking IO and async IO. In summary, virtual threads give you the scalability (not simply "performance") benefits of async IO code while keeping the simplified developer experience of normal threads and blocking IO.

First, it's best to understand the benefit of virtual threads from a webserver. Usually, a webserver maps 1 request to 1 thread. However, most of the time the webserver actually doesn't run much code itself: it calls out to make DB requests, pulls files from disk, makes remote API requests, etc. With blocking IO, when a thread makes one of these remote calls, it just sits there and waits for the remote call to return. In the meantime, it holds on to a bunch of resources (e.g. memory) while it's sitting doing nothing. For something like HFT, that's normally not much of a problem because the goal isn't to server tons of independent incoming requests (sometimes, obviously the usage pattern can differ), but for a webserver, it can have a huge limiting effect on the number of concurrent requests that can be processed, hurting scalability.

Compare that to how NodeJS processes incoming web requests. With Node (and JS in general), there is just a single thread that processes incoming requests. However, with async IO in Node (which is really just syntactic sugar around promises and generators), when a request calls out to something like a DB, it doesn't block. Instead, the thread is then free to handle another incoming web request. When the original DB request returns, the underlying engine in Node essentially starts up that request from where it left off (if you want more info just search for "Node event loop"). Folks found that in real world scenarios that Node can actually scale extremely well to the number of incoming request, because lots of webserver code is essentially waiting around for remote IO requests to complete.

However, there are a couple of downsides to the async IO approach:

1. In Node, the main event loop is single threaded. So if you want to do some work that is heavily CPU intensive, until you make an IO call, the Node server isn't free to handle another incoming request. You can test this out with a busy wait loop in a Node request handler. If you have that loop run for, say, 10 seconds, then no other incoming requests can be dispatched for 10 seconds. In other words, Node doesn't allow for preemptive interruption.

2. While I generally like the async IO style of programming and I find it easy to reason about, some folks don't like it. In particular, it creates a "function coloring" problem: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-... . Async functions can basically only be called from other async functions if you want to do something with the return value.

Virtual threads then basically can provide the best features from both of these approaches:

1. From a programming perspective, it "feels" pretty much like you're just writing normal, blocking IO code. However, under the covers, when you make a remote call, the Java schedule will reuse that thread to do other useful work while the remote call is executing. Thus, you get greatly increased scalability for this type of code.

2. You don't have to worry about the function coloring problem. A "synchronous" function can call out to a remote function, and it doesn't need to change anything about its own function signature.

3. Virtual threads can be preemptively interrupted by the underlying scheduler, preventing a misbehaving piece of code from starving resources (I'm actually less sure of the details on this piece for Java).

Hope that helps!

1. In Node, the main event loop is single threaded. So if you want to do some work that is heavily CPU intensive, until you make an IO call, the Node server isn't free to handle another incoming request. You can test this out with a busy wait loop in a Node request handler. If you have that loop run for, say, 10 seconds, then no other incoming requests can be dispatched for 10 seconds. In other words, Node doesn't allow for preemptive interruption.

Nice. I will add that JS runtimes now have worker threads, while they are still terribly inefficient even when compared to OS threads they can alleviate this problem if you don't need to spawn more than number of cores worker threads. If you are using nodejs and that is not enough, welcome to the microservices world.

I remember they were talking about Green Threads 20 years ago. Is it something new?
Yes, while green and red threads were an implementation detail not exposed to Java programmers, now both thread models are exposed (Threads and Virtual Threads).

Additionally since virtual threads are exposed across the whole runtime and standard library, not only they are built on top of native threads (red), the developers have control over their scheduling.

Yes - green threads at the time were basically a solution to a hardware limitation.

Virtual threads make blocking IO calls automagically non-blocking, allowing for better utilization of the CPU.

Dotnet has had Tasks for years, seems like the same thing.
It's not the same thing.

The TLDR is that it needs “function coloring” which isn't necessarily bad, types themselves are “colors”, the problem being what you're trying to accomplish. In an FP language, it's good to have functions that are marked with an IO context because there the issue is the management of side effects. OTOH, the differences between blocking and non-blocking functions is: (1) irrelevant if you're going to `await` on those non-blocking functions or (2) error-prone if you use those non-blocking functions without `await`. Kotlin's syntax for coroutines, for example, doesn't require `await`, as all calls are (semantically) blocking by default. We should need extra effort to execute things asynchronously.

One issue with “function coloring” is that when a function changes its color, all downstream consumers have to change color too. This is actually useful when you're tracking side effects, but rather a burden when you're just tracking non-blocking code. To make matters worse, for side-effectful (void) functions, the compiler won't even warn you that the calls are now “fire and forget” instead of blocking, so refactorings are error-prone.

In other words, .NET does function coloring for the wrong reasons and the `await` syntax is bad.

Furthermore, .NET doesn't have a usable interruption model. Java's interruption model is error-prone, but it's more usable than that. This means that the “structured concurrency” paradigm can be implemented in Java, much like how it was implemented in Kotlin (currently in preview).

PS: the .NET devs actually did an experiment with virtual threads. Here are their conclusions (TLDR virtual threads are nice, but they won't add virtual threads due to async/await being too established):

https://github.com/dotnet/runtimelab/issues/2398

You're supposed to either await a Task, or to block on it (thus blocking the underlying OS thread which probably eats a couple of megabytes of RAM). It's a completely different system more akin to what Go has been using.
This is not necessarily correct. Tasks can be run in a "fire-and-forget" way. Also, only synchronous prelude of the task is executed inline in .NET.

The continuation will then be ran on threadpool worker thread (unless you override task scheduler and continuation context).

Also, you can create multiple tasks in a method and then await their results later down the execution path when you actually need it, easily achieving concurrency.

Green threads is a more limited solution focused first and foremost on solving blocking.

But there is no point to it in this comparison if it can block a whole thread to itself - that can be done with java as well since forever.
Blocking the thread with unawaited task.Result is an explicit choice which will be flagged by a warning by an IDE and when building, that this may not be what you intended.
Yes, but this is supposedly transparent. At least until you interface directly with native libraries, or with leaky abstractions that don’t account for that.
Nah, cooperative vs pre-emptive models.