| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rstuart4133 59 days ago

Ok, you've been programming for years. But didn't learn a lot about threads, apparently.

> Multithreaded code is often much harder to reason about than async code, because threads can interleave executions and threads can be preempted anywhere.

No, green threads / fibres or whatever you want to call them explicitly don't interleave executions. They are a form of cooperative multitasking. Async/await is another form of co-operative multitasking. One former just builds on what we already have. The latter re-invents the universe.

By the by, the blocker for Javascript green threads wasn't preemption, mostly because there isn't any. It's that Javascript has a "run to completion" model. If the DOM calls a javascript event (which is effectively how all javascript is invoked in a browser), it doesn't block, so it always runs to completion. Green threads break that model. It's not a insurmountable break - the DOM events could always still return immediately, but they could start a green thread that returns to them as soon as they block. Thinking about it, the change is possibly smaller than language changes required by async/await.

If you can reason about where an await is, you can reason about where a green thread yields. The only difference is that one of them clutters your syntax and the other doesn't.

1 comments

josephg 58 days ago

> No, green threads / fibres or whatever you want to call them explicitly don't interleave executions.

If you use them with a multithreaded executor (eg in Go), of course they interleave executions. I suppose all your green threads / fibers could run on a single CPU core. But what's the point? How would that be an improvement over what we have now?

I suppose you could make something similar to async/await but with a yield() operation whenever a call wants to block. This would allow blocking read() and so on. But its basically async/await but without declaring functions as async. And without needing to explicitly await. Await points would be implicit and invisible. But if you do that, any function call you make could yield before returning. As a result, you could no longer easily reason about interleaving. I call foo(). Does it yield to other threads before returning? I have no idea. I could read the code of foo(), but maybe foo will change between minor versions of the library.

This would lead to an avalanche of bugs. Lots of javascript code quietly depends on the lack of interleaving for correctness. Javascript guarantees that while my (non async) function runs, no other code gets executed. Adding threads, even if its via cooperative multitasking, would break that invariant. It would break all sorts of programs which are working correctly today.

> The latter re-invents the universe. [...] Thinking about it, the change is possibly smaller than language changes required by async/await.

Did you write much javascript before async/await and before promises? Javascript at the time was already async. We just implemented async execution through callbacks. ("Callback hell"). Over time, functions tended to go down and to the right. Promises were added as 3rd party libraries. Then promises were standardised. And later, async/await was added as syntax to help you work with promises. Async / await in javascript was an incremental change to give us new syntax to do what we were already doing. JS already had an event loop and promises. Async/await just added syntax.

Threads (cooperative or preemptive) would be a massive change to JS. It would cause an endless parade of bugs, and frozen websites. To say nothing of your notion we could casually reinvent DOM events. That ship sailed a long time ago.

> The only difference is that one of them clutters your syntax and the other doesn't.

One of them is explicit about where and when a thread blocks. Whether or not something is "blocking" (async) is part of the API. Threading (incl cooperative threading) hides this information. Personally, I much prefer this information to be explicit. I need to know as a programmer whether or not execution will be interleaved.

link

rstuart4133 58 days ago

> I suppose all your green threads / fibers could run on a single CPU core.

yes.

> But what's the point? How would that be an improvement over what we have now?

> But its basically async/await but without declaring functions as async.

You answered your own question: yes, you get what you have now, without all the overhead of async, await, promises and futures.

> But if you do that, any function call you make could yield before returning.

A green thread could be an instance of a particular type, so `input = self.yield()` would fail if you aren't a green thread. So no, not "any function" - just ones that instances of a green thread, or are passed a reference to one.

> Does it yield to other threads before returning?

It could if you pass it an instance to a green thread, otherwise it can't.

> This would lead to an avalanche of bugs.

It doesn't. Cooperative multitasking is at least 1/2 a century old at this point. The bugs you're imagining will happen mostly aren't an issue. To the extent they do happen, it's because someone hasn't thought about two control flows modifying the same data structure. Yes, that happens, but it happens in all single threaded code - async included. It's why we hate side effects. It's what Rust famously prevents with its borrow checker even in the face of side effects. It's not avoided by async. The explicit colouring does not help to prevent it - it's just overhead.

FWIW the one issue cooperative multitasking does often introduce is that they can take a long time to execute, so other cooperative tasks don't run in a timely fashion. Exactly the same thing can happen with async of course. It's not usually a problem in browsers, but in embedded solutions where cooperative multitasking is commonly used, it's a real issue because they are often real time. Ask me how I know.

> Javascript guarantees that while my (non async) function runs, no other code gets executed.

This remains true. You are getting confused by your mental model of threads as a form of concurrency. There is no concurrency going on there. Semantically it is near identical to async / await. The principle difference is in async / await, the program is explicitly creating each stack frame on the heap using manually allocated objects. In addition to the mental overhead that creates, it slower than using a real stack like green threads do. But now for the truly bizarre twist. Can you guess how modern javascript engines get around that speed issue? Wait for it .... they create an explicit stack ... that looks like what green threads would use anyway! And as a wonderful side effect - you get real stack back traces again. The irony is almost palpable. https://v8.dev/blog/fast-async

> Threads (cooperative or preemptive) would be a massive change to JS. It would cause an endless parade of bugs, and frozen websites. To say nothing of your notion we could casually reinvent DOM events. That ship sailed a long time ago.

I agree the ship has sailed at this point. The rest of the assertions you make there are wrong.

This assertion stands out: frozen websites. Can you tell me how they are going to block? There are no blocking calls in javascript now. The things you would await on now would be passed a green thread handle. But the javascript scripts events called from the DOM have no green-thread handle, so they can't block.

> Personally, I much prefer this information to be explicit. I need to know as a programmer whether or not execution will be interleaved.

You don't. You've just been conditioned to think that because you've never done it any other way. But the reality is people have been using cooperative multitasking for a long, long time. It pre-dates threads and async. The issues and bugs you are proclaiming would happen don't arise.

link

josephg 58 days ago

If we invented a new language, sure. Cooperative multitasking might be a fun approach. The avalanche of bugs I’m imagining would come from existing JavaScript code being run in a different context than that in which it was written and tested. If you pass me a callback right now, and I call a(); callback(); b();. I can guarantee that the program doesn’t yield to the event loop or other executions between a() and b(). As I understand it, this guarantee no longer holds with coop. multitasking because your callback can yield to another thread.

Good on the V8 team. Sounds like they’ve figured out a way to get the performance of green threads with the better ergonomics of effects systems (async await). Great!

You sound like an expert in cooperative multithreading. If async await can use real stacks, what actual benefits are there to cooperative multithreading? Why prefer them over what JS has now? Pitch them to me.

link

rstuart4133 58 days ago

> The avalanche of bugs I’m imagining would come from existing JavaScript code being run in a different context than that in which it was written and tested.

Oh, right. As you said, the ship has sailed. I think you could bolt green threads onto javascript now without ill effects - apart from bloating the language. I can't see anything that could go badly (certainly no avalanche of bugs). But in javascript green threads are only mildly more ergonomic than async. I wouldn't be bloating the language for such a small return.

Rust is a different position. The current async implementation has two big black hairs. Firstly, they had to come up with a type-safe way saving the functions current state. By state, I mean what a function normally stores on its stack. What they came up with is a work of art in some ways, but it doesn't work well with the borrow checker. The borrow checker insists you prove that you have exclusive use of a variable while it exists. Things on the stack have a limited lifetime (the function call), so the compiler knows they don't exist for very long. Even with that small lifetime it's a battle, but it's workable. Async persists that state, usually to the heap, which can effectively live forever. That wreaks havoc with the borrow checker, causing comments like this: https://news.ycombinator.com/item?id=37436274, quote: "Yes, async is effectively a much harder version of Rust ...".

The second issue is colouring. In the current Rust async implementation of large chunks of it is left to libraries, like tokio. Each of these libraries has to provide their own I/O. They aren't compatible. So if you want to use a cute new HTTP server, you are out of luck unless they provided a version that talks to the async library you are using.

The library writers do their best to accommodate by providing interfaces to the popular async libraries. That forces them to do a extra work. Whereas before they could just call `std::file::File::read()`, now they have to abstract all the I/O they do to a different module, and provide an implementation of that module for each async library they want to support.

The outcome can only be described as a mess, and that's putting it politely. It's harming uptake of the language. It wasn't like they didn't know it was coming either - there were comments pleading for a better implementation. And it wasn't as if weren't better solutions weren't already apparent - they had green threads before, they made some wrong turns with its implementation that needed to be fixed. And it's not like these solutions were harder to do than the async implementation they came up with. Async needed new standard library features to stabilise (like `Pin<>`) and introduced new keywords - none of which was needed for green threads. (Although some would be useful for an efficient green thread implementation - like knowing the maximum amount of stack a function could use.)

In the face of all that, they persisted with async. You'd need a sociologist to explain how that happened - to my engineering brain it's inexplicable. Unlike Javascript it isn't just mildly ergonomic implementation of the same thing, it's a serious mistake - well worth the effort of throwing out and replacing.

link

josephg 58 days ago

On all that, we have near total agreement. I've been complaining about how broken and half-baked rust's async story is for years - for more or less the same reasons you list above:

- You can't name the type of a impl Future.

- They play terribly with the borrow checker because the borrow checker can't handle self referential types.

- There's no future executor in the standard library. You need 3rd party libraries. And the most common library is tokio, which is a whale.

- Despite all the work, there's still no async streams in the language.

- Pin. !Unpin. pin_project. Unsafe pin_project. What are we even doing.

But async works really well in javascript. Maybe where we disagree is that I don't think any of these issues are because async itself is a bad idea. But, async has become the place dreams go to die in rust. Look at the issues above. They're all problems with rust's type system, borrow checker and standard library.

What I think rust needs is:

- A way to have self-borrows in a struct. Types with self borrows would be implicitly pinned.

- A way to name the return value of a function. Eg let x: ReturnType<some_func>. People have been saying this is right around the corner since 2019.

- Generators. Futures are built on top of generators inside the compiler. But generators have - for some reason - never been exposed in stable rust. I think generators should have been stabilised first - since all the problems you need to solve to make generators work well (self referential types, return values you can name, etc) are things futures need too.

Unfortunately I think that ship has sailed too. I try to avoid async rust whenever I can. Its such a pity. I'm hoping someone makes a rust 2.0 language at some point which fixes this situation.

link

rstuart4133 57 days ago

> I think generators should have been stabilised first - since all the problems you need to solve to make generators work well (self referential types, return values you can name, etc) are things futures need too.

Generators are an interesting case. For example, if you implemented a Vec iterator as a generator, it becomes:

    fn vec_iter(&self) {
       for index in 0..self.len() {
           yield &self[i];
       }
    }

Which is arguably easier to understand than the current event driven formulation, which required you to declare a new type to hold your state, and the code looks like:

    fn next(&self) {
       if (self.index >= self.vec.len()) {
            None
       } else {
            self.index += 1;
            &self.vec[self.index - 1]
       }
    }

Effectively the stack frame has become your type, and sequential code is always so much more compact and clearer than the event driven model. The generator could be implemented as a green thread, but you would never entertain the overhead of creating the new stack needed by the green thread implementation.

However ... the async implemented all the mechanics needed to get rid of that green thread stack allocation when the size of the stack is known, as it is in this case. The state saving stuff they created for async could be used to translate that stack to a type. It would, surprise, surprise, contain just `index` - analogous the iterator type we have to manually create for event drive code. So compiler could translate the green thread to the same implementation as the event driven code, but you get to use the compact (and very familiar) syntax of a stack machine.

I found it interesting to see what happens for a more complex generator - like something that returns every node in a tree. You can do it recursively, which is simple clear code, but you don't know the size of the stack so the trick used for the vec iterator (translating it to a type) can't be used. Or you can manually store the state you stored in the stack with a recursive implementation in a Vec<> instead. Both require a memory allocation, but they are different. One is just normal malloc that must be reallocated and moved as the allocation grows. The other can use the OS's stack implementation, that doesn't move as it grows. If you re-used stacks, the OS's stack implementation would be faster in a long running program.

Notice that the transformation from a generator to async implementation is arguably more complex than the same transformation for green threads, especially for the tree traversal.

That observation is one of the reasons I'm such a strong proponent of green threads. The other is a simpler mental model. Unlike async, you don't have to expose the inner mechanisms it depends on, like futures.

link