Hacker News new | ask | show | jobs
by verdagon 1470 days ago
Does anyone else get the feeling that we (as a field) are missing something basic about concurrency? Like there's a really elegant solution just around the corner, that has the low overhead of async/await without the complexity. Or otherwise put, the ease of goroutines but without GC.

I know it sounds crazy. I recently dove into the area, and was pretty surprised at how many interesting building blocks there are out there. It feels like if we just combine them in the right way, we'll discover something that works a lot better.

Off the top of my head:

Google discovered a way to switch between OS threads without the syscall overhead. All it needs is to solve the memory overhead. [0]

Zig discovered a way to use monomorphization to enable colorless async/await. If someone could figure out how to make it work through polymorphism / virtual dispatch, that would be amazing. [1]

Vale discovered a possible way to make structured concurrency in a memory safe way that's easier than existing methods. [2]

Go [3] and Loom [4] show us that we can move stacks around. Loom is particularly interesting as it shows we can move the stack to its original location, a unique mechanism that could solve some other approaches' problems with pointer invalidation.

Cone is designing a unique blend of actors and async await, to enable simpler architectures. [5]

We're close to solving the problem, I can feel it.

[0] No public docs on it, but TL;DR: we tell the OS the thread is blocked, and manually switch over to it by saving/manipulating registers.

[1] https://kristoff.it/blog/zig-colorblind-async-await/

[2] https://verdagon.dev/blog/seamless-fearless-structured-concu...

[3] https://blog.cloudflare.com/how-stacks-are-handled-in-go/

[4] https://youtu.be/NV46KFV1m-4

[5] Can't find the link, but was a discussion on their server.

6 comments

I just want to say there are mountains of research on this, and recent development is exciting, but some of the techniques (like stack switching and moving) are very old. Project Loom is very intriguing because of how it solves the practical problems of introducing old concurrency techniques into existing language implementations that were not designed around them.

A lot of this stuff is intriguing from the implementation side, but where we're really lacking is in the syntax and semantic side to make concurrency "make sense" to programmers. I don't think we're close to solving that problem (for example, call/cc isn't the answer, it's the problem).

imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

A lot of the symptoms of the sync/async problem are caused by the explicit decoupling of sync/async APIs in source code. If you remove that and force it to be implicit internal to the language implementation, the issue goes away. It would take a lot of work to determine if that was worth it.

Basically as we've now accepted garbage collection to be an acceptable part of language implementation, one day I think we'll accept async executors to be a part of that too. We're halfway there on the impl side (Go, Java through Loom, NodeJS, etc). The other half is removing the explicit syntax for it.

> imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

Safepoints for garbage collection are somewhat similar, but for preemption one wants to interrupt threads on a timer, rather than before the collector takes over. Despite occurring very frequently (at around 100 _million_ checks per second), the time overhead is only about 2.5% or so, according to a study by Blackburn et al [0]. It appears, I think, that as long as the fast not-interrupting path is fast enough, eliminating safepoints isn't too important.

[0] Stop and Go: Understanding Yieldpoint Behaviour <https://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint...>

> imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

Sounds like Erlang and single assignment languages.

Jokes aside, part of the problem seems to be the computer model and cpu architectures themselves.

We need something that is designed from scratch to run things concurrently.

Concurrency is mostly a higher level abstraction than the ISA, they don't care what the stack pointer is pointing to or what the return address is. Actually implementing concurrency efficiently is a solved problem, both in the trivial (stack less) and more complex (stackful) cases.

And that's sort of my point, concurrency primitives are really easy to define and implement but pretty hard to use by programmers up the stack.

> Does anyone else get the feeling that we (as a field) are missing something basic about concurrency? Like there's a really elegant solution just around the corner, that has the low overhead of async/await without the complexity. Or otherwise put, the ease of goroutines but without GC.

Yes. There is current research into Algebraic Effects (see for instance https://www.microsoft.com/en-us/research/wp-content/uploads/...).

Algebraic Effects promise a return to non-colored functions, as AE can abstract over exceptions, continuations, async and other control-flow mechanisms.

A decade ago the simple thing we were missing about threaded concurrency was Rust's ownership and borrowing model and Send/Sync. Before that, the simple thing was to use early Java, which had a mandatory garbage collector and monitor objects. If you didn't have or use those, then you were subject to memory safety problems. And moving from heap-scanning GC to ownership and borrowing gave a genuine performance advantage.

Now, we want to remove threading from the concurrency story, in the hopes of getting another performance boost. This itself is the problem, because threads were giving us automatic preemption, akin to how GCs were giving us automatic memory safety. Now we have to statically determine a "good time" for the program to yield. I/O yielding is the easy part, and the reason why people are flocking to async; but we also need to support yielding for fairness reasons. Kernels can do this because they have interrupt timers; but there's no lower-overhead equivalent for userspace code that I'm aware of.

The other problems mentioned with async Rust are particular to Rust itself. The language has a policy that heap allocations only ever happen in `std`, because they want to support embedding Rust into applications where heaps don't exist. This means that futures need to be structs. Rust does support structs of indeterminate size, but barely; and there's no support for structs that can grow. Such a thing is likely unsound without a way for the compiler to check growth limits, and the memory is pinned, so we can't grow beyond a preset limit set at the start of the future[0].

Async infects everything it touches because it's a total pain to write networking library code that's preemption-agnostic. Monad<T> would fix that, but higher-kinded traits aren't a thing in Rust yet and we would need lots of language tooling (akin to `?`) to make this ergonomic to use.

There's also just the possibility that we've been engineering the wrong fix, and we should be trying to get OS threads to be as lightweight as possible rather than trying to move the entire threading system into userspace. There's no particular reason why we need 8MB stacks, other than the fact that compilers don't check stack growth themselves. (Which, BTW, is also a soundness hole in Rust as far as I know.)

[0] Go gets around this with a linked list of stacks, which adds its own overhead.

I'm betting on structured concurrency. I think it will be the same sort of revolution for concurrent programming that structured programming was for single-threaded programming.
You won't solve the broken and unusable programming model of threads by trying to emulate the programming model of threads.
What you are looking for is called Erlang
How is that "the ease of goroutines but without GC" ?