Hacker News new | ask | show | jobs
by zzzcpan 2425 days ago
"async" is required from the algorithmic point of view, so the algorithm can be expressed in terms of specific concurrency model. And in Rust, it's not just "async" that's required, but also a specific executor, which can affect concurrency model as well. Concurrency model can't be an implementation detail.
1 comments

Of course it is -- when you have continuations, which each and every imperative language has, although usually without explicit control over them. With explicit access to continuations, everything expressible with async is expressible without it. Or, more precisely, whether something is "async" or not is not a feature of a certain piece of code but of a certain execution. Essentially, it means whether some computation may want to suspend itself and resume execution later. That you must express that capability as a type-system enforced property of some syntactic element -- a subroutine -- is accidental complexity.

To get an intuitive feel for why that is so, try to imagine that the OS were able to give you threads with virtually no overhead, and you'll see how anything that's expressible with async would be expressible without it. Over the years we've so internalized the fact that threads are expensive that we forgot it's an implementation detail that could actually be fixed.

Computations in Rust can suspend themselves without declaring themselves to be async: that's exactly what they do when they perform blocking IO. They only need to declare themselves async when they want a particular treatment from the Rust compiler and not use the continuation service offered by the runtime, which in Rust's case is always the OS.

> Over the years we've so internalized the fact that threads are expensive that we forgot it's an implementation detail that could actually be fixed.

This is something I wish more people would realize. We assume that 1:1 threads inherently can't scale, mostly because of folklore. (As far as I can tell, the origin of this claim is back-of-the-envelope calculations involving stack sizes, leading to address space exhaustion on 32-bit—obviously something that hasn't been relevant for servers for a decade.) That's why we build async/await and elaborate userspace scheduling runtimes like the one Go has (and the one Rust used to have pre-release).

I think it's worth questioning the fundamental assumptions that are causing us to do this. Can we identify why exactly 1:1 threading is uncompetitive with async/await schemes, and fix that?

All this, by the way, is not to say that Rust made the wrong decision in focusing on async/await. Rust generally follows the philosophy of "if the computing environment is difficult, it's Rust that has to adapt".

> Can we identify why exactly 1:1 threading is uncompetitive with async/await schemes, and fix that?

I think we can identify it, but fixing it is not easy, at least at the kernel level.

The easy part is the cost of scheduling. The kernel must schedule threads with very different behaviors -- say, encoding a video or serving requests over a socket -- so it uses a scheduling algorithm that's a compromise for all uses. But we can let the language's runtime take the scheduling of kernel threads with something like this: https://youtu.be/KXuZi9aeGTw

The harder part is managing the memory required for the stack. Even on 64-bit systems, the OS can't shrink and grow stacks finely enough. For example, I don't think there's a hard requirement that, say, memory below the sp is never accessed, so the OS can't even be sure about how much of the stack is used and can't uncommit pages. But even if there were such a requirement, or, that the language could tell the OS it follows such a requirement, still the OS can only manage memory at a page granularity, which is too much for lightweight concurrency. Any finer than that requires knowing about all pointers into the stack.

We can do it in languages that track the location of all pointers in all frames, though, which is what we're attempting to do in Java, and this allows us to move stacks around, grow them and shrink them as necessary, even at a word granularity.

> All this, by the way, is not to say that Rust made the wrong decision in focusing on async/await. Rust generally follows the philosophy of "if the computing environment is difficult, it's Rust that has to adapt".

... and it follows C++'s (horribly named) "zero-cost abstractions" philosophy which can reasonably be said to be a requirement of Rust's target domains. So I certainly don't think async/await is wrong for C++/Rust/Zig, but I think it's wrong for, say, JavaScript and C#, and we're going a different way in Java (more like Scheme's).

Another possible contributing factor is that in one respect Rust is a higher-level language than Java or JS: it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend. That, BTW, is why Kotlin adopted something similar to async/await.

> at a page granularity, which is too much for lightweight concurrency

I don't think that's been conclusively shown. 4kB is smaller than a lot of single stack frames. It would be interesting for someone to measure how large e.g. Go stacks are in practice—not during microbenchmarks.

> Another possible contributing factor is that perhaps ironically, in one respect Rust is a higher-level language than Java or JS, as it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend.

It's not really a question of "control"; we can and do land changes upstream in LLVM (though, admittedly, they sometimes get stuck in review black holes, like the noalias stuff did). The issue is more that LLVM is very large, monolithic, and hard to change. Upstream global ISel is years overdue, for example. That's one of the reasons we have Cranelift: it is much smaller and more flexible.

But in any case, the code generator isn't the main issue here. If GC metadata were a major priority, it could be done in Cranelift or with Azul's LLVM GC support. The bigger issue is that being able to relocate pointers into the stack may not even be possible. Certainly it seems incompatible with unsafe code, and even without unsafe code it may not be feasible due to pointer-to-integer casts and so forth. Never say never, but relocatable stacks in Rust seems very hard.

The upshot is that making threading in Rust competitive in performance to async I/O for heavy workloads would have involved a tremendous amount of work in the Linux kernel, LLVM, and language design—all for an uncertain payoff. It might have turned out that even after all that work, async/await was still faster. After all, even if you make stack growth fast, it's hard to compete with a system that has no stack growth at all! Ultimately, the choice was clear.

> I don't think that's been conclusively shown.

That would be interesting to study. We'll do it for Java pretty soon, I guess.

> After all, even if you make stack growth fast, it's hard to compete with a system that has no stack growth at all!

If the same Rust workload were to run on such a system there wouldn't be stack growth, either. The stack would stay at whatever size Rust now uses to store the async fn's state.

> a tremendous amount of work in the Linux kernel

I don't think any work in the kernel would have been required.

> Ultimately, the choice was clear.

I agree the choice is pretty clear for the "zero-cost abstractions" approach, and that Rust should follow it given its target domains. But for domains where the "zero-cost use" approach makes more sense, the increase in accidental complexity is probably not worth some gain in worst-case latency. But that's always the tradeoff the two approaches make.

You suggested the switchto patch, right? That requires Linux kernel work.
> So I certainly don't think async/await is wrong for C++/Rust/Zig, but I think it's wrong for, say, JavaScript and C#, and we're going a different way in Java (more like Scheme's).

How can it be wrong for JavaScript? Because JavaScript is a single threaded environment, it was the only possible choice and it's one of the most influential change in the language (in itself it changed more the language use than the whole ES6 bundle). It's a really great success and I'm not sure why you'd like to revert it.

> I'm not sure why you'd like to revert it.

I don't. The old way and async/await aren't the only two options.

> it was the only possible choice

Why?

What would you do instead? Use a green thread system on top of a single-threaded VM? Great, now you have two kinds of blocking calls: the one which only block its own thread, and the one which block all threads at the same time because it blocks the VM itself. How ergonomic!

Remember, the single threaded character of the js VM is not an implementation detail, it's part of the spec. Hate it or love it but the web works this way.

Also, if you think threads are a good concurrency abstraction, let's play a little game: consider you need to read 1M files on a spinning disk. How many threads do you need to run on to get the maximum read performance:

a) one per file

b) just one

c) one per core

d) a magic number which depends on your hard disk's firmware and your workload.

Convenient and intuitive right?

Threads are a dated concurrency primitive which would have died long ago if wasn't also a good parallelism primitive.

> something that hasn't been relevant for servers for a decade

Has Rust officially relinquished small systems to C?

I explicitly said that async I/O is the right decision for Rust. And C has no I/O concurrency features at all, so that's completely irrelevant.
If your concurrency model only allows bounded nondeterminism, you can't express an algorithm that assumes a concurrency model with unbounded nondeterminism in it. Furthermore, the choices concurrency models make hugely affect performance, so it doesn't even matter much if models were equivalent. So, no, algorithms do have to be explicit about concurrency models, whether you like it or not.
That some piece of code is explicit about what it does and that the capability to do so must be declared ahead of time in the type system are two different things. For example, every subroutine must be explicit about what it does and what, if anything, it returns, yet languages differ on how much of that must be declared in the subroutine's type signature -- Haskell requires being explicit in the type signature about return types as well as effects, Rust is explicit about the return type and some kinds of effects, Java is explicit about the return types and a smaller subset of effects, and JavaScript is explicit about neither. A language can provide the same suspension (or "await") behavior as Rust's async/await without requiring subroutines that use that mechanism to declare that they do so in the type system. Scheme does precisely that (with shift/reset).

In fact, Rust already gives you two ways to suspend execution -- one requires a declaration, and the other does not, even though the two differ only in the choice of implementation. That you wish to use the language's suspension mechanism rather than the OS's is not a different "model." It's the same model with different implementations.

Whether you need to declare that a subroutine blocks or not has no bearing on the fairness of scheduling.

Type declaration is supposed to signify the choice of concurrency model. Either way algorithmically the choice has to be explicit, even it it's not a type declaration, it still has to be an explicit declaration somewhere or there is no choice and algorithms have to be expressed in terms of a single concurrency model.
> Type declaration is supposed to signify the choice of concurrency model.

That "supposed to" is an aesthetic/ideological/pragmatic/whatever preference, and one with significant tradeoffs. Again, Scheme's shift/reset gives you the same control over scheduling as Rust's async/await, but you don't need to declare that in the type signature. Rust puts it in the type signature because in the domains Rust targets it is important that you know precisely what kind of code the compiler generates.

> Either way algorithmically the choice has to be explicit, even it it's not a type declaration, it still has to be an explicit declaration somewhere or there is no choice and algorithms have to be expressed in terms of a single concurrency model.

First, we're not talking about two models, but one model with two implementations (imagine that the OS could give you control over scheduling, like here [1]; you'll have the exact same control over the "concurrency model" but without the type declaration -- even in Rust -- although you'll have less precise control over memory footprint). Second, that the language's design dictates that you must tell the compiler in the type signature what it is that your computation does is precisely what creates accidental complexity, as it impacts your code's consumers as well.

[1]: https://youtu.be/KXuZi9aeGTw

With "async/await" you see exact places where you give up control, it's part of the model, without it you don't and it's a different model. First one lets you program as if you can always access shared memory like you are the only one doing it, since you always know exactly at which point this assumption no longer holds, second one requires you to be more careful about any function call, basically limiting you to the same assumption but only for code in between function calls, reducing power to abstract things with functions. Things get even worse with different executors, that give you completely different concurrency models and break all the assumptions of a simple event loop executor.