| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by haradion 1441 days ago

You might think of Rust's async paradigm as "half a continuation, turned upside down". With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe. Most languages with continuations manage this by "pausing" your function and keeping its stack frame around, which, in the general case, means your function's stack frame has to be heap-allocated, which is basically the language itself giving you a "pseudo-thread". You eventually get control back with the same stack frame, and as far as the language is concerned, how you get back there is none of your concern; that's its job.

In Rust's polling-based model, there's no "magic" saving of stack frames. You get some space to store state, but the runtime has to manage that memory itself. You can use the language to express "this is the next thing to call", but when you spawn an async I/O task and yield to it, you've already returned from your own function to the runtime, and it's the runtime's job to call your function again with the state it had stashed away. You then jump over the steps in your function that have already been handled and call into the next thing. It gets a bit more involved due to various bits of syntactic sugar, but that's the basic model. It's operating at a lower level of abstraction than many languages' coroutines or call/cc, which gives you the flexibility to customize the behavior to meet specific needs.

A runtime for generic desktop/server apps may maintain a thread pool and call back into your code on one of those threads. In WebAssembly, execution is single-threaded, but JavaScript promises may call into your runtime, and you have to dispatch that to the right Rust future. On embedded platforms, the data structures that the desktop/server runtime uses may simply not be suitable (e.g. because you have no general-purpose heap allocator), so you need to use a different approach with more constraints.

Interoperability between these runtime is possible. The key is that you need a task that's running on one runtime to be able to spawn a task on the other, with part of that task's job being to notify the first runtime that it's time to poll the "parent" task again. The mechanics vary depending on how each runtime handles task spawning.

As I understand it (from having skimmed some articles a while back), C++'s co_await isn't really all that different. Since we don't have the executors proposal as part of the standard yet, it's still a "bring-your-own runtime" sort of approach, with some kind of glue required at the boundaries between runtimes. Depending on which "flavor" of C++ coroutines you're using (e.g. push-based vs. pull-based), that interop might be easier than Rust's at the cost of other tradeoffs (e.g. more heap allocations).

1 comments

saurik 1440 days ago

> With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe.

I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".

Instead, in a "traditional" coroutine, a continuation-passing transform is implemented in the compiler that changes -- in the best case of having this wrapped up in a Monad (which Rust could really use support for right about now) -- "do A and then B" into "do A while telling A to call the continuation of B when it is done, and otherwise immediately return". B isn't a "runtime" and isn't the "language"; you could argue B is an "executor" but it is unique to every call.

So if you want a no-op A it would be "call the continuation it is passed, immediately". This would result in behavior identical to the original synchronous function: we call A, which does whatever it wanted to do (in this case nothing) and then it chains through to B". As the call to the continuation is in tail position for this case, the resulting behavior should work out to being nearly identical (like the CPU won't be able to branch predict this as efficiently, but it will have similar overhead).

In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.

This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.

Like: just writing normal synchronous code also involves heap allocations as you have to allocate the stack space for the next frame every call. You can elide that in many cases by pre-allocating a bunch of memory for the stack, but a sufficiently-deep call stack will overflow the memory you allocated and break in some potentially-catastrophic manner. It is a fiction that you can write essentially anything of consequence without either heap allocations or some fuzzy understanding by the developer of how hard they can push it until it breaks.

link

haradion 1436 days ago

> I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".

Syntactically, many languages represent the operation of calling into the next continuation as a regular return (for green threads) or a regular function call (call/cc), but there's always some degree of runtime magic involved in the generated code. For instance, rather than just incrementing or decrementing the stack pointer, you've got to potentially set it to point into a totally different runtime-allocated stack. In principle, that can probably be implemented as just special-case code generation rather than an actual call into the runtime's routines, but that still leaves the need to clean up the current task's stack after it returns (or does a tail call into another stack), which will be either an explicit runtime call or rely on the runtime's garbage collector.

The real magic, though, isn't so much in the user-written continuations as it is on "async blocking" calls for things like I/O.

> In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.

This is precisely what Rust's async runtime libraries are. They provide the event loop/callback mechanisms, which are necessary for truly async code. (Otherwise, what is there to wait for?) You can totally write and call an async function in Rust that doesn't use a runtime, but there's no way for it to "asynchronously block"; you'd just poll it and get back, "Yep, I'm done; here's my result."

> This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.

It's not just the depth of the stack; it's that, once you yield, another task may take over the thread's flow control entirely. Let's say that function A spawns a coroutine B without waiting for it to finish. Now let's imagine that B allocates space on the same thread stack that A was using (on top of A's stack frame) and then yields. At the yield point, something (e.g. the runtime) has to say, "OK, B is stuck, so what do we run next on the thread?" Eventually, it's A's turn to finish running, and it returns. If it does this naïvely, it'll rewind the stack pointer, dropping the stack frames for both A and B. But B isn't done running yet; it's just blocked, so now we've got a problem because its stack just got clobbered. To avoid this, languages that use "stackful" coroutines have to allocate coroutines' stacks on the heap in many cases because the traditional single-stack model isn't just running out of space; it totally breaks down.

Rust uses stackless coroutines, which impose some restrictions on how the coroutine is structured (mostly involving unbounded recursion) so that the state the task has to store between yield points has a fixed size.

link

haradion 1436 days ago

When you follow the restrictions on stackless coroutines, the coroutine can, as you mentioned, elide stack allocations for the "child" coroutines. If you want to make a call that can't have its stack allocation elided, you explicitly tell the runtime to spawn it as a top-level task.

link