Hacker News new | ask | show | jobs
by pcwalton 2502 days ago
As a reminder, you don't need to use async/await to implement socket servers in Rust. You can use threads, and they scale quite well. M:N scheduling was found to be slower than 1:1 scheduling on Linux, which is why Rust doesn't use that solution.

Async/await is a great feature for those who require performance beyond what threads (backed by either 1:1 or M:N implementations) can provide. One of the major reasons behind the superior performance of async/await futures relative to threads/goroutines/etc. is that async/await compiles to a state machine in the manner described in this post, so a stack is not needed.

6 comments

I find async/await easier to reason about than threads for anything more involved than the 1 request per thread web server use case. This is because you avoid bringing in the abstraction of threads (or green threads) and their communication with one another. You trade syntactical complexity (what color is your function, etc), for semantic complexity (threads, channels, thread safety, lock races).
I would agree to the statement for languages where it's an either/or. E.g. in Javascript there are only callbacks and async/await, so we can avoid all the complexity of threads - which is great!

However in multithreaded languages it's always an AND. Once you add async/await people need to know how traditional threading as well as how async/await works. Rust will also make that very explicit. Even if you use async/await on a single thread the compiler will prevent accessing data from more than one task - even it those are running on the same thread. So you need synchronization again. With maybe the upside that this could be non-thread-safe (or !Sync in Rust terms) in order to synchronize between tasks that run on the same thread. But however also with the downside that utilizing the wrong synchronization primitives (e.g. thread-blocking ones) in async code can stall also other tasks.

Overall I think using async/await in Rust is strictly more complex. But it has its advantages in terms of performance, and being able to cancel arbitrary async operations.

They have the same semantic complexity. You have tasks in async/await and you still need to deal with inter-task communication, locking, etc.
The main difference is that in async/await you control where context switches occur and the syntax (.await) explicitly points them out. This means you can often avoid locks and do things in a more straightforward manner.
Note that this applies only to tasks that are !Sync. If they aren't, two tasks accessing the same state could be moved to different threads and access that state racily. In that case .await tells you nothing about accessing non-local state. However, for purposefully single-threaded task pools avoiding locks this way certainly seems possible.
Async/await are effectively threads, the switches are just scheduled statically.
Async/await is also good for integration with other systems where starting new threads is not practical or you are calling non-threadsafe FFI functions. Tokio offers the CurrentThread runtime, which allows concurrency without creating any new threads.
To implement, no. To make a performant server- yes. Most systems have limited number of threads (low limit constant compiled with the kernel), and each thread is triggered by the scheduler, not by network events, which is very uneconomical.

You with application level concurrency you get 100x performance boost for network servers.

On extreme workloads, perhaps. But we have people happily running Rust code with thousands of threads per second in production.

M:N was experimentally found to be slower than 1:1 in Rust.

epoll io loop performs better for most network io though and is simplier to manage when you have to start dealing with out of band issues (like efficient hearbeats - every time I've had a conversation without how to move some of the heartbeat code over to async it comes down to just accepting it isn't going to be as efficient as my c++ implementation and either strain heavily of accept over publication).

Last time I saw there was still a couple extra allocations going on too in the compiler (I was told they were being worked on) and basically the default executor, tokio, wasn't very efficient at changing events in the queue (requiring a an extra self signal to get the job done).

I'd be interesting to see how little cost these are, because there is defintely a cost to the generator implementation. Yes, if I wrong a generator to do this, I couldn't write it better, but I wouldn't write a generator (and that would be a very odd definition of zero-cost there anything can be called zero cost even GC as long as it is implemented well - well, that depends on if rustc saves unnecessary state).

> Additionally, it should allow miri to validate the unsafe code inside our generators, checking for uses of unsafe that result in undefined behavior.

This is really useful as a lot of the buffer handing code needs to use unsafe for efficienty issues. And the enums sharing values is nice too - hopefully the extra pointer derferences can be optimized out.

I do worry though about all this state sitting on the head and ruining cache locality on function calls though.

If you want to use epoll directly in Rust, go right ahead. Nothing is stopping you. You don't have to switch to C++ to use epoll.

It's pretty clear that most people don't want to write and maintain that kind of code, though—that's what async/await is for. Personally, I hate writing state machines.

They're not very well done. Rust kind of skipped the whole non-blocking thing and went straight to tokio, then had to backtrack a little to mio+tokio, then when mio is only half way jupmed to async/await. (mio was originall tokio rolled into it - it was the whole thing and you had to use mios sockets and mios timers on the mio event loop - tokio was then split out - metal io, it was not - this was because there wasnt a non-blocking interface in rust yet).

Async is deinitely a distraction. Rust should have stabilized other pieces more before jumping to yet another paradigm. It is gaining this collection of half-implemented things. I'm guessing generators are going to the the next move before async/await is even in the print book.

Rust has moved from zero-cost abstraction to "an fairly efficient implemention of everything you can think of".

Rust didn't implement async/await for fun. The team implemented it because it was perhaps the #1 requested feature from users, including me.

You can see it here on HN. What's the #1 comment that always appears on Rust async threads? Invariably it's "why not just use goroutines? That kind of code is so much easier to write!" Now for various reasons M:N threading like in Go doesn't work in Rust. What does work, and offers most of the ergonomics of threads/goroutines, is async/await. That's why the team has prioritized it.

There are people who care less about ergonomics than optimizing every CPU cycle for socket servers. For those people, the focus on getting the ergonomics right before microoptimization of mio and tokio might be frustrating to see. But the overwhelming majority of users just want easy-to-use async I/O. Async/await is an enormous improvement in readability and maintainability over hand-written state machines. Rust would be doing a disservice to its users if it focused on microoptimization of tokio instead.

> Rust kind of skipped the whole non-blocking thing and went straight to tokio, then had to backtrack a little to mio+tokio, then when mio is only half way jupmed to async/await.

Not sure exactly what you mean here; mio predates tokio by a couple years, and has been possible to use standalone since before tokio ever became a thing.

Early versions of mio had a bunch of non-essential things like a timer wheel built-in, that were later removed.
What does tokio have to do with that?
This strikes me as needlessly hostile without contributing anything.
Honestly I don’t get it why simply using mio (epoll, kqueue, iocp wrapper) is so unpopular.
See these examples of how async/await simplifies looping and error handling: https://docs.rs/dtolnay/0.0.3/dtolnay/macro._01__await_a_min.... Await syntax makes it much easier to compose asynchronous code from different libraries, in the same way that ordinary function calls compose synchronous code. Talking to epoll/mio directly is certainly possible, but it's hard to make two different libraries work well together if they both need to talk to a single epoll handle.
Yeah but you’re talking about things not even stable. Old tokio with closures is terrible imo.
It’s extremely close to being stable; it just missed a release train, because there’s some last minute polish that needs to be done. It will be on its way soon!
everybody always points to the 5 line basics, never to how complex async/await gets in a real system with timers, tuneouts, multiple errors types that need to be handled in different ways, heartbeats, side channel info, etc... It isn't better at that point.

Cool, you can write a 5 line echo server easier, but good luck with a high performance server handling important transactions.

(If there is documentation on how to do some of these or good example code, please point me to it, because my last attempt at getting send/rcv heartbeats efficiently didn't work.)

The networking implementations I've had to do were all made a lot easier moving to async/await from hand-written state machines. It's a reduction in LOC terms of 6x or so, and the logic is much, much easier to follow.

State machines are viable if your transition function is such that effectively every combination of (state, input) leads to a different state. If the transition function is mostly describing "go to the (single) next state or error," then you're essentially asking the user to do a lot of bookkeeping that compilers are good at and users are bad at.

Having ported a full peer-to-peer engine in JavaScript from old-school callbacks to async/await, I strongly disagree with your comment.

Async/await really shines when the complexity arises.

People are doing a lot more with async/await than five-line echo servers. They're converting entire codebases and seeing dramatic reductions in noise.
Probably because you're just looking at examples; that's selection bias. The combinators/manual implementing Future approach to async IO gets even worse at scale, async/await is infinitely easier. If you don't like it, nobody is forcing you to use it.
Because most people don't like writing state machines by hand.
Pre-await Tokio code looks way worse to me.
A lot of that, I suspect, is that Rust's borrow checker makes specifying ownership correctly for callback chains extremely painful and unergonomic.

In my experience, closure-based async programming is only sufficiently painless in managed languages to be worth using. When you're dealing with manual memory management (or, rather, smart pointer memory management), you spend a lot more time trying to make the captures hold onto stuff.

I'd agree with this, and emphasize the point that this stuff is really tricky to get right without GC. Fighting the borrow checker is somewhat expected when you're dealing with this level of inherent complexity in your memory management.

One of the key reasons for shipping async/await is that it erases almost all of this difficulty and lets you write straight-line code again.

> tokio, wasn't very efficient at changing events in the queue (requiring a an extra self signal to get the job done).

Could you expand on that?

Curious what is the fundamental difference that makes Go do M:N thread efficiently? Considering that the compiler has far less information than Rust about the program.
Go is more efficient at M:N then Rust can be mostly for two reasons:

1. Go can start stacks small and relocate them, because the runtime needed to implement garbage collection allows relocation of pointers into the stack by rewriting them. Rust has no such runtime infrastructure: it cannot tell what stack values correspond to pointers and which correspond to other values. Additionally, Rust allows for pointers from the heap into the stack in certain circumstances, which Go does not (I don't think, anyway). So what Rust must do is to reserve a relatively large amount of address space for each thread's stack, because those stacks cannot move in memory. (Note that in the case of Rust the kernel does not have to, and typically does not, actually allocate that much physical memory for each thread until actually needed; the reservation is virtual address space only.) In contrast, Go can start thread stacks small, which makes them faster to allocate, and copy them around as they grow bigger. Note that async/await in Rust has the potential to be more efficient than even Go's stack growth, as the runtime can allocate all the needed space up front in some cases and avoid the copies; this is the consequence of async/await compiling to a static state machine instead of the dynamic control flow that threads have.

2. Rust cares more about fast interoperability with C code that may not be compiled with knowledge of async I/O and stack growth. Go chooses fast M:N threading over this, sacrificing fast FFI in the process as every cgo call needs to switch to a big stack. This is just a tradeoff. Given that 1:1 threading is quite fast and scalable on Linux, it's the right tradeoff for Rust's domain, as losing M:N threading isn't losing that much anyway.

Go needs to allocate a growing stack on the heap, needs to move it around, etc. It's not as efficient as Rust's async.
But is it as efficient or more than the linux threads?
I don't have the full answer (and I would love if someone more knowledgeable could jump in this thread) but I'd say it depends since there are a few antagonistic effects :

- goroutines are (unless it changed since last time I used it) cooperatively scheduled. It's cheaper than preemptive scheduling, but it can lead to big inefficiencies on some workload of you're not careful enough (tight loops can hold a (Linux) thread for a long time and prevent any other goroutines from running on this thread).

- goroutines start with a really small (a few kB) stack which needs to be copied to be grown. If you end up with a stack as big as a native stack, you'd have done a lot of copies in the process, that wouldn't have been necessary if the stack was allocated upfront.

I don't think the implication was that Go does it efficiently.
> One of the major reasons behind the superior performance of async/await futures relative to threads/goroutines/etc. is that async/await compiles to a state machine in the manner described in this post, so a stack is not needed.

That's a great optimization but doesn't that mean it also breaks stack traces?