Hacker News new | ask | show | jobs
by zackmorris 2502 days ago
This is one of the most concise tutorials on how generators, coroutines and futures/promises are related (from first principles) that I've seen.

I'm hopeful that eventually promises and async/await fade into history as a fad that turned out to be too unwieldy. I think that lightweight processes with no shared memory, connected by streams (the Erlang/Elixer, Go and Actor model) are the way to go. The advantage to using async/await seems to be to avoid the dynamic stack allocation of coroutines, which can probably be optimized away anyway. So I don't see a strong enough advantage in moving from blocking to nonblocking code. Or to rephrase, I don't see the advantage in moving from deterministic to nondeterministic code. I know that all of the edge cases in a promise chain can be handled, but I have yet to see it done well in deployed code. Which makes me think that it's probably untenable for the mainstream.

So I'd vote to add generators to the Rust spec in order to make coroutines possible, before I'd add futures/promises and async/await. But maybe they are all equivalent, so if we have one, we can make all of them, not sure.

4 comments

It's the same underlying mechanism for generators as for futures: they are stackless coroutines. All the space they need for local variables is allocated ahead of time.

In my experience, the fact that they are stackless is not at all obvious when you're coding with them. Rust makes working with them really simple and intuitive.

Debugging can be pain though, as you may not know the right stack, and makes it harder to follow how the code executed in that context. But yes, rather than writing an async state machine with callbacks, I would prefer this.
https://crates.io/crates/tracing is attempting to solve this issue!
Regarding determinism, async is way more deterministic than multiple threads, because you don’t have arbitrary point where execution contexts can change.
That's true in a way, but only for multithreaded code. Multi-process code with full isolation uses different metaphors like joining threads within higher order functions to achieve parallelism in code that looks single-threaded.

For example, lisp-based languages like Clojure can be statically analyzed and then parallelized so that all noninteracting code runs in its own process. This can also be done for code that operates on vectors like MATLAB and TensorFlow.

For me, isolated processes under the Actor model in languages like Elixer/Erlang and Go is much simpler conceptually than async/await, which is only one step above promises/futures, which is only one step above callback hell. I know that the web world uses async/await for now, but someday I think that will be replaced with something that works more like Go.

> I think that lightweight processes with no shared memory, connected by streams [...] are the way to go.

No, at least not in general. There are a lot of problems in the real world for which "no shared memory" is incompatible with "efficient parallelism".

That's actually not true - the efficiencies due to copying can be overcome with the runtime or abstractions.

For example, the copy on write (COW) mechanism of unix where memory pages are mapped to the same location until a forked process writes to one, in which case the virtual memory manager makes a mutable copy.

There's also Redux and Closure's immutable state tree that makes copies under a similar mechanism to COW but through code, since they run at a level of abstraction above C++ or Rust.

My feeling is that these techniques run within a few percent of the speed of hand-optimization. But in the real world, I've seen very little human code remain optimized over the long term. Someone invariably comes along who doesn't understand the principles behind the code and inadvertently does a manual copy somewhere or breaks the O() speed of the algorithm by using the wrong abstractions. Meanwhile immutable code using mechanisms like COW avoids these pitfalls because the code is small and obvious.

I feel that the things that Rust is trying to do were solved long ago under FP, so I don't think it's the language for me. That's also why I moved away from C#, Java, C++, etc. Better languages might be Elixer or Clojure/ClojureScript, although they still have ugly syntax from a mainstream perspective compared to say Javascript or Python. I love that Rust exists as a formal spec of a mature imperative programming language. I think it's still useful in kernels and the embedded space. But I'm concerned that it's borrowing ideas like async/await that trade determinism for performance.

> I feel that the things that Rust is trying to do were solved long ago under FP

I don't think this is true at all. Statically analyzed Sync/Send are an amazing tool that I don't see in any other language. In fact, your point of view w.r.t. the actor model is extremely well-supported in Rust, due to Sync/Send traits. Immutable objects can be shared between threads using simple pointers, and mutable objects can have ownership moved between threads with just a pointer copy.

Those threads may have 'shared memory' in the strict sense of the term. But compared to other languages, Rust makes working with shared memory 'feel' like working with independent processes, except with extremely efficient message passing. The language statically guards against threads interacting directly as long as you don't use primitives like Mutex<T>.

> For example, the copy on write (COW) mechanism of unix where memory pages are mapped to the same location until a forked process writes to one

That is shared memory – memory which is shared between two or more threads or processes.

> There's also Redux and Closure's immutable state tree that makes copies under a similar mechanism to COW but through code

That's also shared memory.

"Shared memory" does not necessarily mean "shared mutable memory". Rust, for example, statically prevents memory from being simultaneously shared and mutable (except through synchronization primitives like mutexes etc.). A pure actor-based language, in contrast, would prevent even immutable memory from being shared.

>> dynamic stack allocation of coroutines, which can probably be optimized away anyway

This seems interesting. Do you have any pointers to places/papers I can look more into this? I'm also curious, since the stacks have to be rather small when you are running several thousands of coroutines (like Go), how often people get into issues of running out of stack because of some big stack allocation somewhere and stuff like that.

I haven't studied it deeply, but a breadcrumb would be that cooperative threads (green threads) are equivalent to coroutines.

Ok it looks like current techniques are stackless runtimes and compiling coroutines to stackless continuations:

https://en.wikipedia.org/wiki/Stackless_Python

https://stackless.readthedocs.io/en/v3.6.4-slp/library/stack...

http://jessenoller.com/blog/2009/02/23/stackless-you-got-you...

https://engagedscholarship.csuohio.edu/cgi/viewcontent.cgi?r...

https://pdfs.semanticscholar.org/b9aa/49e4b7a00e6c9f0d8c18ba...

https://www.osnews.com/story/9822/protothreads-extremely-lig...

This looks like a rare gem, although I just started reading it:

https://cs.indiana.edu/~dfried/dfried/mex.pdf

I grew up with the cooperative multithreading of classic Mac OS and was really shocked when I first saw Javascript back in the 90s and it had no notion of it (because it didn't have generators). That sent us down the callback hell evolutionary dead end, through promises/futures and finally to async/await where we are now. That could have been largely avoided if we had listened to programming language experts!

Oh I think I misundetstood you (or you me), and I didn't really articulate my question well. I'm well aware of how stackless coroutines are implemented.

My main concern was whenever you do that, you lose the ability to look at backtrace when something goes wrong. So some implementations keep a separate stack for each coroutine (like Go), and switch SP whenever context is switched. That way you don't have bunch of random function calls in your stack trace. The problem though is that these individual stack for each couroutine has to be pretty small in general (since you would spawn hundreds of thousands of these). Go solved this temporarily using fragmented stack, and later ditched that in favor of copying the whole stack over. And I thought, you were talking about optimizations around this whole "stackfull" coroutines thing so that I can have my backtrace.