| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pcwalton 2425 days ago

> at a page granularity, which is too much for lightweight concurrency

I don't think that's been conclusively shown. 4kB is smaller than a lot of single stack frames. It would be interesting for someone to measure how large e.g. Go stacks are in practice—not during microbenchmarks.

> Another possible contributing factor is that perhaps ironically, in one respect Rust is a higher-level language than Java or JS, as it compiles to a VM -- LLVM/WASM -- over which it doesn't have full control, so it doesn't have complete control over its backend.

It's not really a question of "control"; we can and do land changes upstream in LLVM (though, admittedly, they sometimes get stuck in review black holes, like the noalias stuff did). The issue is more that LLVM is very large, monolithic, and hard to change. Upstream global ISel is years overdue, for example. That's one of the reasons we have Cranelift: it is much smaller and more flexible.

But in any case, the code generator isn't the main issue here. If GC metadata were a major priority, it could be done in Cranelift or with Azul's LLVM GC support. The bigger issue is that being able to relocate pointers into the stack may not even be possible. Certainly it seems incompatible with unsafe code, and even without unsafe code it may not be feasible due to pointer-to-integer casts and so forth. Never say never, but relocatable stacks in Rust seems very hard.

The upshot is that making threading in Rust competitive in performance to async I/O for heavy workloads would have involved a tremendous amount of work in the Linux kernel, LLVM, and language design—all for an uncertain payoff. It might have turned out that even after all that work, async/await was still faster. After all, even if you make stack growth fast, it's hard to compete with a system that has no stack growth at all! Ultimately, the choice was clear.

1 comments

pron 2425 days ago

> I don't think that's been conclusively shown.

That would be interesting to study. We'll do it for Java pretty soon, I guess.

> After all, even if you make stack growth fast, it's hard to compete with a system that has no stack growth at all!

If the same Rust workload were to run on such a system there wouldn't be stack growth, either. The stack would stay at whatever size Rust now uses to store the async fn's state.

> a tremendous amount of work in the Linux kernel

I don't think any work in the kernel would have been required.

> Ultimately, the choice was clear.

I agree the choice is pretty clear for the "zero-cost abstractions" approach, and that Rust should follow it given its target domains. But for domains where the "zero-cost use" approach makes more sense, the increase in accidental complexity is probably not worth some gain in worst-case latency. But that's always the tradeoff the two approaches make.

link

pcwalton 2425 days ago

You suggested the switchto patch, right? That requires Linux kernel work.

link

pron 2425 days ago

Assuming you know how to relocate stacks, you could implement efficient delimited continuations in Rust without the kernel's involvement, and use whatever it is you use for async/await scheduling today (so I'm not talking about some built-in runtime scheduler like Go's or Erlang's). But this would require changes in LLVM, and also a somewhat increased footprint (although the footprint could be addressed separately, also by changes in LLVM). The kernel work is only required if you want to suspend computations with non-Rust frames, which you can't do today, either.

link

pcwalton 2424 days ago

That sounds like mioco/old M:N. That was slower than our current 1:1 threading.

link

pron 2424 days ago

I don't know what the old M:N model did, but a compiler can generate essentially the same code as it does for async/await without ever specifying "async" or "await." The reason for that is that the compiler already compiles any subroutine into a state machine (because that's what a subroutine is) in a form you need for suspension. But you do need to know what that form is, hence the required change in LLVM.

There is no inherent reason why it would be any slower or faster. It also does not imply any particular scheduling mechanism.

link