Hacker News new | ask | show | jobs
by Spivak 1800 days ago
You are ignoring the downside to green threads which is that it’s cooperative. If the thread doesn’t yield control back to the event loop then the real OS thread backing the loop is now stuck.

Which leads to dirty things like inserting sleep 0 at the top of loops and dealing with really unbalanced scheduling of threads don’t hit yields often enough. Plus with loom it might not be obvious that some function is a yield since it’s meant to be transparent so if you grab a lock and yield you make everyone wait until your scheduled again.

Green threads are great! I love them and they’re the only real solutions to really concurrent IO heavy workloads but it’s not a panacea and trades one kind of discipline for another.

7 comments

Which is why the advice would be "Don't use virtual threads for CPU work".

It just so happens that a large number of JVM users are working with IO bound problems. Once you start talking about CPU bound problems the JVM tends not to be the thing most people reach for.

Loom doesn't remove the CPU bound solution by adding the IO solution. Instead, it adds a good IO solution and keeps the old CPU solution when needed.

In fact, there's already a really good pool in the JVM for common CPU bound tasks. `Forkjoin.common()`.

Sleep 0 sounds like quite a hack, Go has the neater https://pkg.go.dev/runtime#Gosched instead, and I assume there will be a Java equivalent as well. And if most stdlib methods and all blocking methods call it, it's going to be pretty difficult to hang a green thread.
FWIW, Java has had `Thread#yield()`[0] since inception.

[0]: https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.h...()

Since there is a runtime that knows everything about the state of the thread, my understanding is that there is no need for explicit yields. Everything will turn automagically into non-blocking (except for FFI)
FWIW, while you are probably correct in the context of Loom--a specific implementation that I honestly haven't looked at much--you shouldn't generalize to "green threads" of all forms as you not only can totally implement this well but Erlang does so: as you are working with a byte code and a JIT anyway, you instrument the code to check occasionally if it was preempted (I believe Erlang does this for every potentially-backward jump, which is sufficient to guarantee even a broken loop can be preempted).
Agreed, but you have other single-threaded server languages like NodeJS which have the same problem (a new request can only be handled if the current request gives up control, usually waiting for IO) and people have figured out how to handle it.

I see Project Loom as really providing all the benefits of single threaded languages like Node (i.e. tons of scalability), but with an easier programming model that threads provide as opposed to using async/await.

I was under the impression that Loom was implementing preemptable lightweight threads. Is that not the case?
So loom uses interesting terminology when talking about this. They say that they’re preemptive and not cooperative because there’s not an explicit await/yield keyword that you call from your code but that isn’t the whole story because threads are only preempted when they perform IO or are synchronized. So you as an author can’t know for sure where the yield points are and aren’t supposed to rely on them but they’re still there. You’re not going to be forcefully preempted in the middle of number crunching.

I think most people would consider this a surprising notion of preemption where it’s out of your control-ish but also not arbitrary like it is for OS threads which still leads to basically the same problems and constraints as cooperative threads.

Yeah... this is a place where I disagree with how the Loom devs define "preemptive". They are basically defining it as "most tasks will give up control when they hit a blocking operation". Yet, it's been my understanding that preemption means the scheduler can stop a currently operating task from running and switch to something else. That's not what happens with loom.
> So loom uses interesting terminology when talking about this.

That is a common terminology. Wikipedia says: [1]

The term preemptive multitasking is used to distinguish a multitasking operating system, which permits preemption of tasks, from a cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources. ... The term "preemptive multitasking" is sometimes mistakenly used when the intended meaning is more specific, referring instead to the class of scheduling policies known as time-shared scheduling, or time-sharing.

> threads are only preempted when they perform IO or are synchronized

First, they can be preempted by any call, explicit or implicit, to the runtime (or any library, for that matter). For all you know, class loading or even Math.sin might include a scheduling point (although that is unlikely as that's a compiler intrinsic). We make no promises on when scheduling can occur. Not only do threads not explicitly yield, code cannot statically determine where scheduling might occur; I don't believe anyone can consider this "cooperative."

Second, Loom's virtual threads can also be forcibly preempted by the scheduler at any safepoint to implement time sharing. Currently, this capability isn't exposed because we're yet to find a use-case for it (other than one special case that we want to address, but isn't urgent). If you believe you have one, please send it to the loom-dev mailing list.

The reason it's hard to find good use cases for time slicing is as follows:

1. If you have only a small number of threads that are frequently CPU bound. In that case, just make them platform threads and use the OS scheduler. Loom makes it easy to choose which implementation you want for each thread.

2. If you have a great many threads, each of which can infrequently become CPU-bound, then the scheduler takes care of that with work-stealing and other scheduling techniques.

3. If you have a great many threads, each of which is frequently CPU-bound, then your cores are oversubscribed by orders of magnitude -- recall that we're talking about hundreds of thousands or possibly millions of threads -- and no scheduling strategy can help you.

It's possible that there could arise real-world situations where infrequent CPU-boundedness might affect responsiveness, but we'll want to see such cases before deciding to expose the mechanism. Even OSes don't like relying on time-sharing (it happens less frequently than people think on well-tuned servers), and putting that capability in the hands of programmers is an attractive nuisance that will more likely cause a degradation in performance.

[1]: https://en.wikipedia.org/wiki/Preemption_(computing)#Preempt...

Isn't the point of preemption to degrade gracefully when the cores are oversubscribed? E.g. the first system I worked on ran potentially CPU-heavy work from various clients, and used per-client threads to isolate them; every so often clients would find ways to get their thread stuck doing a large amount of CPU work (e.g. regex backtracking) and although these were in some sense bugs (and we did fix them), it was very useful that even if one or two clients blocked all their threads (which was often more than our number of physical cores), this wouldn't completely block other clients' threads from running.
There's no doubt forced preemption could help, but I'm still unsure about what the right algorithm is; probably not time sharing.

Suppose you have 100K threads, and only 1% of them become CPU-bound for 100ms. That could take down your 32-core server for 3 seconds, which is bad. But suppose we had 10ms time-slices. Then, those busy threads' latency might go from 100ms to as high as a few minutes, which means effectively taking them down. The scale has a qualitative effect here. So, rather than time-sharing, it might be better to optionally install some other preemption policy -- maybe something that indefinitely suspends threads that behave badly too often and puts them in some collection.

The point is that time-slicing will probably not be helpful in sufficiently many cases, and we don't yet know what will. We'd like to gather more data before offering something. In some other languages/runtimes it might be worthwhile to just expose a capability and see what people do with it, but with Java, within five minutes you'll have twenty libraries doing time-sharing, and thousands of people using them blindly whether it's good or bad for them (just because they say they do time-sharing, and that's good, no?), and now there's just noise and bad habits everywhere. This is nanny-state governance, but we've learned our lesson, and you can't be too careful with an ecosystem this big.

Sure. Ultimately you've got the same problem as an OS scheduler and recognising whether threads are CPU-bound or IO-bound and treating them separately is probably going to be part of that.

I appreciate not wanting to do things until you can do them right, but equally if you advertise this as a preemptive runtime, people are going to expect that they can use it to throw 32 CPU-spinning threads onto 8 cores and have it behave gracefully. It sounds like from a user's point of view on day 1 this runtime will be the worst of both worlds - you need to take care to not do big chunks of CPU work without yielding, but you don't get the full control that a traditional "userspace" cooperative multitasking framework would give you.

It sounds like it is: https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1....

But the other side of that is that sometimes non-preemption is also a desirable property— like in JavaScript, or Python asyncio, knowing that you don't need to lock over every little manipulation of some shared data structure because you're never going to yield if you didn't explicitly await.

I think that's not quite it:

I believe that loom is implementing cooperative lightweight threads and simultaneously reworking all of the blocking IO operations in the Java standard library to include yields. I guess this means that you could, for example, hold an OS-level thread forever by writing an infinite loop that doesn't do any IO...

I believe both the OS and the JVM is free to reschedule it. Yields are just an explicit way of possibly changing the thread.
When you have a runtime, you have proper information whether there is work being done on a given virtual thread - So in case of Loom, afaik any blocking call will turn into non-blocking auto-magically (other than FFI, but that is very rare in Java), since the JVM is free to wait on that asynchronously behind the scenes and do some other work in the meantime.
:) sleep 0! I was trying to see if there is a way to preempt stuck threads (infinite loops etc), and wrote a small while loop replacement

  pwhile(()-> loop predicate, ()-> {loop body});

All it does is add a thread.isinterrupted check to the predicate. At this point, best to switch to Erlang !