|
You might think of Rust's async paradigm as "half a continuation, turned upside down". With traditional coroutines, after an async operation completes, the language's runtime calls back into your code, and you actively call the next thing, "pushing" control flow down the pipe. Most languages with continuations manage this by "pausing" your function and keeping its stack frame around, which, in the general case, means your function's stack frame has to be heap-allocated, which is basically the language itself giving you a "pseudo-thread". You eventually get control back with the same stack frame, and as far as the language is concerned, how you get back there is none of your concern; that's its job. In Rust's polling-based model, there's no "magic" saving of stack frames. You get some space to store state, but the runtime has to manage that memory itself. You can use the language to express "this is the next thing to call", but when you spawn an async I/O task and yield to it, you've already returned from your own function to the runtime, and it's the runtime's job to call your function again with the state it had stashed away. You then jump over the steps in your function that have already been handled and call into the next thing. It gets a bit more involved due to various bits of syntactic sugar, but that's the basic model. It's operating at a lower level of abstraction than many languages' coroutines or call/cc, which gives you the flexibility to customize the behavior to meet specific needs. A runtime for generic desktop/server apps may maintain a thread pool and call back into your code on one of those threads. In WebAssembly, execution is single-threaded, but JavaScript promises may call into your runtime, and you have to dispatch that to the right Rust future. On embedded platforms, the data structures that the desktop/server runtime uses may simply not be suitable (e.g. because you have no general-purpose heap allocator), so you need to use a different approach with more constraints. Interoperability between these runtime is possible. The key is that you need a task that's running on one runtime to be able to spawn a task on the other, with part of that task's job being to notify the first runtime that it's time to poll the "parent" task again. The mechanics vary depending on how each runtime handles task spawning. As I understand it (from having skimmed some articles a while back), C++'s co_await isn't really all that different. Since we don't have the executors proposal as part of the standard yet, it's still a "bring-your-own runtime" sort of approach, with some kind of glue required at the boundaries between runtimes. Depending on which "flavor" of C++ coroutines you're using (e.g. push-based vs. pull-based), that interop might be easier than Rust's at the cost of other tradeoffs (e.g. more heap allocations). |
I mean, with "traditional" coroutines, it isn't the "language's runtime" which calls back into my code: it is whatever code completed the event. I get that the important part of this sentence is the interest in "push" vs. "poll", but this concept of the existence of a "language's runtime" is a bit strange to me, as my mental model of a coroutine doesn't involve a "runtime" and certainly doesn't involve an "executor".
Instead, in a "traditional" coroutine, a continuation-passing transform is implemented in the compiler that changes -- in the best case of having this wrapped up in a Monad (which Rust could really use support for right about now) -- "do A and then B" into "do A while telling A to call the continuation of B when it is done, and otherwise immediately return". B isn't a "runtime" and isn't the "language"; you could argue B is an "executor" but it is unique to every call.
So if you want a no-op A it would be "call the continuation it is passed, immediately". This would result in behavior identical to the original synchronous function: we call A, which does whatever it wanted to do (in this case nothing) and then it chains through to B". As the call to the continuation is in tail position for this case, the resulting behavior should work out to being nearly identical (like the CPU won't be able to branch predict this as efficiently, but it will have similar overhead).
In a more complex scenario, the function A is going to do something mysterious and later get a callback from something -- which you might call a "runtime" but which almost certainly isn't implemented by the "language" -- on some random background thread running an I/O loop, or maybe due to a signal / handler from the operating system, or whatever random mechanism it has in place to run code later (which again: isn't part of the "language") and it will run the continuation it was passed.
This does, likely, result in some heap allocation somewhere in order to type erase the continuation in the general case. However, this seems to only be due to how the asynchronous code has been given a harder challenge of dealing with arbitrarily deep stacks with minimal overhead, while people seem totally OK with synchronous code causing random stack overflows :/. If you are willing to relax that assumption a bit then you can elide that allocation almost every time.
Like: just writing normal synchronous code also involves heap allocations as you have to allocate the stack space for the next frame every call. You can elide that in many cases by pre-allocating a bunch of memory for the stack, but a sufficiently-deep call stack will overflow the memory you allocated and break in some potentially-catastrophic manner. It is a fiction that you can write essentially anything of consequence without either heap allocations or some fuzzy understanding by the developer of how hard they can push it until it breaks.