Hacker News new | ask | show | jobs
by likeliv 2425 days ago
Right. Is it really accidental complexity? This is the implementation detail that the users of the language doesn't really need to know as they would just write 'async fn' (and #[async_trait] while waiting for the issues mentioned in this blog post get solved).
1 comments

Oh, whether or not it's accidental depends on the domain, and Rust is designed for domains where such detailed control over resources could well be said to be essential, but it is nonetheless complexity, even if it is entirely inferred, as the compiler would block uses that don't comply with a very specific contract that is all about specific code generation. Even here the need to say that the call is "async" is not required from the algorithmic point of view, and in languages with a different philosophy and access to a JIT -- like, say, Java -- this is an implementation detail that could be worked out by the compiler entirely on a use-site basis (i.e. the same function could be "async" or not depending on how it's used in a particular callsite, not how it's defined).

But there's definitely a very big tradeoff -- in both approaches -- between fine-grained control over resources and "non-algorithmic complexity" if you'd like to call it that.

"async" is required from the algorithmic point of view, so the algorithm can be expressed in terms of specific concurrency model. And in Rust, it's not just "async" that's required, but also a specific executor, which can affect concurrency model as well. Concurrency model can't be an implementation detail.
Of course it is -- when you have continuations, which each and every imperative language has, although usually without explicit control over them. With explicit access to continuations, everything expressible with async is expressible without it. Or, more precisely, whether something is "async" or not is not a feature of a certain piece of code but of a certain execution. Essentially, it means whether some computation may want to suspend itself and resume execution later. That you must express that capability as a type-system enforced property of some syntactic element -- a subroutine -- is accidental complexity.

To get an intuitive feel for why that is so, try to imagine that the OS were able to give you threads with virtually no overhead, and you'll see how anything that's expressible with async would be expressible without it. Over the years we've so internalized the fact that threads are expensive that we forgot it's an implementation detail that could actually be fixed.

Computations in Rust can suspend themselves without declaring themselves to be async: that's exactly what they do when they perform blocking IO. They only need to declare themselves async when they want a particular treatment from the Rust compiler and not use the continuation service offered by the runtime, which in Rust's case is always the OS.

> Over the years we've so internalized the fact that threads are expensive that we forgot it's an implementation detail that could actually be fixed.

This is something I wish more people would realize. We assume that 1:1 threads inherently can't scale, mostly because of folklore. (As far as I can tell, the origin of this claim is back-of-the-envelope calculations involving stack sizes, leading to address space exhaustion on 32-bit—obviously something that hasn't been relevant for servers for a decade.) That's why we build async/await and elaborate userspace scheduling runtimes like the one Go has (and the one Rust used to have pre-release).

I think it's worth questioning the fundamental assumptions that are causing us to do this. Can we identify why exactly 1:1 threading is uncompetitive with async/await schemes, and fix that?

All this, by the way, is not to say that Rust made the wrong decision in focusing on async/await. Rust generally follows the philosophy of "if the computing environment is difficult, it's Rust that has to adapt".

> Can we identify why exactly 1:1 threading is uncompetitive with async/await schemes, and fix that?

I think we can identify it, but fixing it is not easy, at least at the kernel level.

The easy part is the cost of scheduling. The kernel must schedule threads with very different behaviors -- say, encoding a video or serving requests over a socket -- so it uses a scheduling algorithm that's a compromise for all uses. But we can let the language's runtime take the scheduling of kernel threads with something like this: https://youtu.be/KXuZi9aeGTw

The harder part is managing the memory required for the stack. Even on 64-bit systems, the OS can't shrink and grow stacks finely enough. For example, I don't think there's a hard requirement that, say, memory below the sp is never accessed, so the OS can't even be sure about how much of the stack is used and can't uncommit pages. But even if there were such a requirement, or, that the language could tell the OS it follows such a requirement, still the OS can only manage memory at a page granularity, which is too much for lightweight concurrency. Any finer than that requires knowing about all pointers into the stack.

We can do it in languages that track the location of all pointers in all frames, though, which is what we're attempting to do in Java, and this allows us to move stacks around, grow them and shrink them as necessary, even at a word granularity.

> All this, by the way, is not to say that Rust made the wrong decision in focusing on async/await. Rust generally follows the philosophy of "if the computing environment is difficult, it's Rust that has to adapt".

... and it follows C++'s (horribly named) "zero-cost abstractions" philosophy which can reasonably be said to be a requirement of Rust's target domains. So I certainly don't think async/await is wrong for C++/Rust/Zig, but I think it's wrong for, say, JavaScript and C#, and we're going a different way in Java (more like Scheme's).

Another possible contributing factor is that in one respect Rust is a higher-level language than Java or JS: it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend. That, BTW, is why Kotlin adopted something similar to async/await.

> at a page granularity, which is too much for lightweight concurrency

I don't think that's been conclusively shown. 4kB is smaller than a lot of single stack frames. It would be interesting for someone to measure how large e.g. Go stacks are in practice—not during microbenchmarks.

> Another possible contributing factor is that perhaps ironically, in one respect Rust is a higher-level language than Java or JS, as it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend.

It's not really a question of "control"; we can and do land changes upstream in LLVM (though, admittedly, they sometimes get stuck in review black holes, like the noalias stuff did). The issue is more that LLVM is very large, monolithic, and hard to change. Upstream global ISel is years overdue, for example. That's one of the reasons we have Cranelift: it is much smaller and more flexible.

But in any case, the code generator isn't the main issue here. If GC metadata were a major priority, it could be done in Cranelift or with Azul's LLVM GC support. The bigger issue is that being able to relocate pointers into the stack may not even be possible. Certainly it seems incompatible with unsafe code, and even without unsafe code it may not be feasible due to pointer-to-integer casts and so forth. Never say never, but relocatable stacks in Rust seems very hard.

The upshot is that making threading in Rust competitive in performance to async I/O for heavy workloads would have involved a tremendous amount of work in the Linux kernel, LLVM, and language design—all for an uncertain payoff. It might have turned out that even after all that work, async/await was still faster. After all, even if you make stack growth fast, it's hard to compete with a system that has no stack growth at all! Ultimately, the choice was clear.

> So I certainly don't think async/await is wrong for C++/Rust/Zig, but I think it's wrong for, say, JavaScript and C#, and we're going a different way in Java (more like Scheme's).

How can it be wrong for JavaScript? Because JavaScript is a single threaded environment, it was the only possible choice and it's one of the most influential change in the language (in itself it changed more the language use than the whole ES6 bundle). It's a really great success and I'm not sure why you'd like to revert it.

> something that hasn't been relevant for servers for a decade

Has Rust officially relinquished small systems to C?

I explicitly said that async I/O is the right decision for Rust. And C has no I/O concurrency features at all, so that's completely irrelevant.
If your concurrency model only allows bounded nondeterminism, you can't express an algorithm that assumes a concurrency model with unbounded nondeterminism in it. Furthermore, the choices concurrency models make hugely affect performance, so it doesn't even matter much if models were equivalent. So, no, algorithms do have to be explicit about concurrency models, whether you like it or not.
That some piece of code is explicit about what it does and that the capability to do so must be declared ahead of time in the type system are two different things. For example, every subroutine must be explicit about what it does and what, if anything, it returns, yet languages differ on how much of that must be declared in the subroutine's type signature -- Haskell requires being explicit in the type signature about return types as well as effects, Rust is explicit about the return type and some kinds of effects, Java is explicit about the return types and a smaller subset of effects, and JavaScript is explicit about neither. A language can provide the same suspension (or "await") behavior as Rust's async/await without requiring subroutines that use that mechanism to declare that they do so in the type system. Scheme does precisely that (with shift/reset).

In fact, Rust already gives you two ways to suspend execution -- one requires a declaration, and the other does not, even though the two differ only in the choice of implementation. That you wish to use the language's suspension mechanism rather than the OS's is not a different "model." It's the same model with different implementations.

Whether you need to declare that a subroutine blocks or not has no bearing on the fairness of scheduling.

Type declaration is supposed to signify the choice of concurrency model. Either way algorithmically the choice has to be explicit, even it it's not a type declaration, it still has to be an explicit declaration somewhere or there is no choice and algorithms have to be expressed in terms of a single concurrency model.