Hacker News new | ask | show | jobs
by PaulHoule 1004 days ago
(1) It's a bit of a bad smell (which he points out) that records aren't being used much at all in the Java stdlib, I wrote something that built out stubs for the 17 and 18 stdlibs and that stood out like a sore thumb. I do like using records though.

(2) I've looked at other ways to extend the collections API and related things, see

https://github.com/paulhoule/pidove

and I think the sequenced collections could have been done better.

(3) Virtual Threads are kinda cool but overrated. Real Threads in Java are already one of the wonders of the web and perform really well for most applications. The cases where Virtual Threads are really a win will be unusual but probably important for somebody. It's a good thing it sticks to the threads API as well as it did because I know in the next five years I'm going to find some case where somebody used Virtual Threads because they thought it was cool and I'll have to switch to Real Threads but won't have a hard time doing so.

4 comments

I think the biggest impact of virtual threads is that the ecosystem will abandon asynchronous APIs. No more futures, callbacks, servers where you have to make sure not to block the thread, reactive frameworks, etc. Just nice simple imperative blocking code. Nima is the first example i've seen:

https://helidon.io/nima

We've had two production bugs in the last two weeks caused by handlers blocking the server thread in apps using an async web framework, which would simply not have happened with a synchronous server.

You'll still have the structured concurrency calls but that's much better than pure callback hell.
They won't abandon async callback based code.

VT have too much memory overhead to be equivalent.

* for high performance stuff

you can still wrap the medium speed/slower stuff in virtual threads.

Do you have a citation for that? Genuinely curious.
The stack for the VT requires a heap allocation [0], which ok, not huge deal for most scenarios, but something to consider. Reactive programming will avoid that. For example, for a service that doesn't do much IO (like an in memory pubsub thing or CDN) you would still want to use reactive programming if you care about performance, since likely the code will be simple anyway.

[0] https://openjdk.org/jeps/444

But what’s more expensive, some more ram? Or the hours upon hours upon hours wasted in dev salaries trying to develop and debug reactive code?

Also is that VT allocation more than all of the extra allocations from reactive frameworks internally? Or all of the heap capturing lambdas that you pass to reactive libraries? Do you have a source comparing any of this?

Bullshit - reactive frameworks allocate a shit ton of helper classes.
I'd definitely be interested to see some benchmarks of real-world code, once virtual threads and its attendant web frameworks have had a year or two to mature.
> The stack for the VT requires a heap allocation

One object per stack frame of a virtual thread is cheaper than one callback object per suspension point.

I suspect if we had records from the start they'd be all over the stdlib, but because of backwards compatibility they'll likely only be considered for new APIs.
I think virtual threads are huge.

The problem with regular threads is (a) multi-kb memory stack per thread and (b) consuming a file handle.

Either of those severely limits the scalability of the most "natural" parallelism constructs in Java (perhaps generally). Whole classes of application can now just be built "naturally" where previously there were whole libraries to support it (actors, rxJava, etc etc).

It make take a while for people to change their habits, but this could be quite pervasive in how it changes programming in general in all JVM languages.

You could easily have a million threads if you use multi-kb stacks. Million times multi-kb means multi-gb, that's still 3-4 orders of magnitude less than big memory servers/VMs. (and 1 order of magnitude less than a normal laptop)

What do you mean by using a file handle, is this a Windows platform thing? On *ix, threads don't use up file descriptors (but you can still have a million fd's at least on linux for other stuff if you want).

> On *ix, threads don't use up file descriptors

Thanks - this caused me to dig into the specific scenario where creating threads was exhausting file handles in my experience and you are correct - consuming a file handle is indeed not intrinsic to creating a new thread in Linux. It's insanely easy for literally anything you do with the thread to consume a file handle, but of course, that applies to virtual threads as well. Thanks!

But "multi-kb" in this context probably actually means about 1MB.
What do you base this on? The stacks and kernel bookkeeping shouldn't use nearly this much at least on linux. Keep in mind that thread stacks have are lazily allocated virtual memory so won't use as much physical memory as the thread stack size setting shows.

If these threads are handling TCP connections and L7 protocol processing on top, you're going to have nontrivial both kernel and userspace memory usage per connection too that may dwarf the thread overhead.

Here's a linux kernel dev (Ingo Molnar) benchmarking Linux in 2002 and starting just shy of 400k threads in 4 GB: https://lkml.iu.edu/hypermail/linux/kernel/0209.2/1153.html - though on a 32 bit systems lots of objects things are 50% the size compared to current 64 bit. But still gives you a ballpark.

> Either of those severely limits the scalability

you can avoid both issues by using 20yo executorservice.

If the code is simple, blocking code, then the number of threads required in the pool is the average total duration of a request times the fanout times the request rate. That number can easily reach many thousands and more.
yes, you shouldn't add blocking code into executorservice..
Wtf, where on Earth do you put blocking code then? Firing off some long-running task in a background thread through executors is bog-standard usecase.
discussion was about specific context: avoiding overhead from spawning millions of threads, in this case you shouldn't have any blocking code at all, all API should utilize epoll underneath or something similar.
Then you either don't get the same scalability that virtual threads give you or you get it but with asynchronous code that requires not just more work but can't enjoy the same observability/debuggability on the Java platform.
could you give example what requires more work exactly and where virtual threads give more "observability"?..
Virtual threads are strictly better than normal threads, no? I am thinking of any reason to still use traditional threads. Is there any downside?
Currently virtual threads aren't a good match if you have a CPU heavy workload. The scheduler isn't fair and if your code doesn't enter into any blocking code it won't be unmounted from the carrier thread.
Ahh. It makes sense. But it’s much better fit for file io/sockets/db.
Why would you use virtual threads for CPU heavy loads?
You wouldn't. The GP was illustrating a situation where virtual threads are not a good substitute for native threads.