|
> Can we identify why exactly 1:1 threading is uncompetitive with async/await schemes, and fix that? I think we can identify it, but fixing it is not easy, at least at the kernel level. The easy part is the cost of scheduling. The kernel must schedule threads with very different behaviors -- say, encoding a video or serving requests over a socket -- so it uses a scheduling algorithm that's a compromise for all uses. But we can let the language's runtime take the scheduling of kernel threads with something like this: https://youtu.be/KXuZi9aeGTw The harder part is managing the memory required for the stack. Even on 64-bit systems, the OS can't shrink and grow stacks finely enough. For example, I don't think there's a hard requirement that, say, memory below the sp is never accessed, so the OS can't even be sure about how much of the stack is used and can't uncommit pages. But even if there were such a requirement, or, that the language could tell the OS it follows such a requirement, still the OS can only manage memory at a page granularity, which is too much for lightweight concurrency. Any finer than that requires knowing about all pointers into the stack. We can do it in languages that track the location of all pointers in all frames, though, which is what we're attempting to do in Java, and this allows us to move stacks around, grow them and shrink them as necessary, even at a word granularity. > All this, by the way, is not to say that Rust made the wrong decision in focusing on async/await. Rust generally follows the philosophy of "if the computing environment is difficult, it's Rust that has to adapt". ... and it follows C++'s (horribly named) "zero-cost abstractions" philosophy which can reasonably be said to be a requirement of Rust's target domains. So I certainly don't think async/await is wrong for C++/Rust/Zig, but I think it's wrong for, say, JavaScript and C#, and we're going a different way in Java (more like Scheme's). Another possible contributing factor is that in one respect Rust is a higher-level language than Java or JS: it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend. That, BTW, is why Kotlin adopted something similar to async/await. |
I don't think that's been conclusively shown. 4kB is smaller than a lot of single stack frames. It would be interesting for someone to measure how large e.g. Go stacks are in practice—not during microbenchmarks.
> Another possible contributing factor is that perhaps ironically, in one respect Rust is a higher-level language than Java or JS, as it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend.
It's not really a question of "control"; we can and do land changes upstream in LLVM (though, admittedly, they sometimes get stuck in review black holes, like the noalias stuff did). The issue is more that LLVM is very large, monolithic, and hard to change. Upstream global ISel is years overdue, for example. That's one of the reasons we have Cranelift: it is much smaller and more flexible.
But in any case, the code generator isn't the main issue here. If GC metadata were a major priority, it could be done in Cranelift or with Azul's LLVM GC support. The bigger issue is that being able to relocate pointers into the stack may not even be possible. Certainly it seems incompatible with unsafe code, and even without unsafe code it may not be feasible due to pointer-to-integer casts and so forth. Never say never, but relocatable stacks in Rust seems very hard.
The upshot is that making threading in Rust competitive in performance to async I/O for heavy workloads would have involved a tremendous amount of work in the Linux kernel, LLVM, and language design—all for an uncertain payoff. It might have turned out that even after all that work, async/await was still faster. After all, even if you make stack growth fast, it's hard to compete with a system that has no stack growth at all! Ultimately, the choice was clear.