Hacker News new | ask | show | jobs
by grug_htmx_dev 1021 days ago
Yes, async is effectively a much harder version of Rust, and it's regrettable how it's been shoved down the throats of everyone, while only 1% of projects using it really need it. Hover, async is also amazing in these 1% of cases when it's useful.

If you have a service that handles massive amounts of network calls at the core (think linkerd, nginx, etc.), or you want to have a massive amount of lightweight tasks in your game, or working on an embedded software where you want cooperative concurrency, async Rust is an amazing super-power.

Most system/application level things is not going to need async IO. Your REST app is going to be perfectly fine with a threadpool. Even when you do need async, you probably want to use it in a relatively small part of your software (network), while doing most of the things in threads, using channels to pass work around between async/blocking IO parts (aka hybrid model).

Rust community just mindlessly over-did using async literally everywhere, to the point where the blocking IO Rust (the actually better UX one) became a second class citizen in the ecosystem.

Especially visible with web frameworks where there is N well designed async web frameworks (Axum, Wrap, etc.) and if you want a blocking one you get:

  tiny_http, absolute bare bones but very well done
  rouille - more wholesome, on top of tiny_http, but APIs feel very meh comparing to e.g. Axum
  astra - very interesting but immature, and rather barebones
3 comments

The argument here is that Rust chose to implement coroutines the wrong way. It went the route of stackless coroutines that need async/await and colored functions. This creates all the friction the article laments over.

But it also praises Go for its implementation, which is also based on a coroutine of a different kind. Stackful coroutines, which do not have any of these problems.

Rust considered using those (and, at first, that was the project's direction). Ultimately, they went to the stackless operation model because stackfull coroutine requires a runtime that preempts coroutines (to do essentially what the kernel does with threads). This was deemed too expensive.

Most people forget, however, that almost no one is using runtime-free async Rust. Most people use Tokio, which is a runtime that does essentially everything the runtime they were trying to avoid building would have done.

So we are left in a situation where most people using async Rust have the worst of both worlds.

That being said, you can use async Rust without an async runtime (or rather, an extremely rudimentary one with extremely low overhead). People in the embedded world do. But they are few, and even they often are unconvinced by async Rust for their own reasons.

Rust chose to drop the green thread library so that it could have no runtime, supporting valuable use cases for Rust like embedding a Rust library into a C binary, which we cared about. Go is not really usable for this (technically it's possible, but it's ridiculous for exactly this reason). So those sorts of users are getting a lot of benefit from Rust not having a green threading runtime. As are any users who are not using async for whatever reason.

However, async Rust is not using stackless coroutines for this reason - it's using stackless coroutines because they achieve a better performance profile than stackful coroutines. You can read all about it on Aaron Turon's blog from 2016, when the futures library was first released:

http://aturon.github.io/blog/2016/08/11/futures/

http://aturon.github.io/blog/2016/09/07/futures-design/

It is not the case that people using async Rust are getting the "worst of both worlds." They are getting better performance by default and far greater control over their runtime than they would be using a stackful coroutine feature like Go provides. The trade off is that it's a lot more complicated and has a bunch of additional moving parts they have to learn about and understand. There's no free lunch.

People love(d) rust because it’s a pleasant language to write code for while also being insanely performant. Async is taking away the first point and making it miserable to write code for. If this trend continues, it’ll ultimately destroy the credibility of the language and people will choose other languages. The proposers of async did not take this into account when they were proposing async
I designed async/await and I absolutely did take this into account. I designed it to be as pleasant as possible under the constraints.
Can you admit that you failed in making it a pleasant experience to write async, especially for library authors? I don’t think it’s too late to admit failure and implement something like May https://github.com/Xudong-Huang/may
no, I don't admit that, and I think you're an enormous asshole
Naive question, since I tried my hand at rust years ago, but haven't looked at it since: isn't it possible to write another crate to build go-like channels? A kind of "write, then lose the reference" function call that places a value on a queue, and an accompanying receiver. That could make life easier for "normal" software development.
There are many such primitives in Rust (including one in the standard library). And it's effectively the default, the only annoying thing is the libraries which use async (it is possible to just wrap the async code in sync code, just a little annoying. But I think it's what most users of the language should do.)
But "most" users can live with a bit of overhead in return for safe parallelism. It's just a handful that wants to squeeze the last bit of power out of a CPU.

The other day, Intel revealed a processor with 66 thread support per core. 64 of those threads were called "slow", because there's no prefetching and speculative execution, as they are supposed to be waiting (mainly for memory, but networking could be another option). Perhaps very many cheap hardware threads is a way out of this.

Threads are driven by the OS. Something needs to drive couritines, so there's no way around needing some (even rudimentary, like in embedded) executor. But to be a versatile and universal systems language, Rust can't just build-in executor into a language.

I think that stackless coroutines are better than stackfull, in particular for Rust. Everything was done correctly by the Rust team.

Again, this is all fair and good, as long as people understand the tradeoff and make good technical decisions around. If they all jump on async bandwagon blind o the obvious limitations, we get where Rust ecosystem is now.

Well people who jumped on async bandwagon are deeply involved in Rust community. So if they do something, others have to assume they are doing it right.
For better or worse, when faced with choices like this Rust has consistently decided to make sure it's workable for the lowest-level usecases (embedded, drivers, etc). I respect the consistency, and I appreciate that it's focused on an under-served market, especially compared to eg. web applications (an over-served market, if anything), even if it's sometimes a bummer for me personally
> Rust considered using those (and, at first, that was the project's direction). Ultimately, they went to the stackless operation model because stackfull coroutine requires a runtime that preempts coroutines (to do essentially what the kernel does with threads). This was deemed too expensive.

Stackful coroutines don't require a preemptive runtime. I certainly hope that we didn't end up with colored functions in Rust because of such a misconception.

They often implement soft preemption. Tokio and others like Glommio do. Usually, it's based on interrupts. The runtime schedules a timer to fire an interrupt, and some code is injected into the interrupt handler.

This is used to keep track of task runtime quotas so they can yield as soon as possible afterward.

This is the same technique used in Go and many others for preemption. If you don't add this, futures that don't yield can run forever, stalling the system.

You are right that it is not strictly necessary, but in practice, it is so helpful as a guard against the yielding problem that it's ubiquitous.

> I certainly hope that we didn't end up with colored functions in Rust because of such a misconception.

Misconceptions are everywhere unfortunately!

Tokio and glommio using interrupts is ironically another misconception. They're cooperatively scheduled so yes, a misbehaving blocking task can stall the scheduler. They can't really interrupt an arbitrary stackless coroutine like a Future due to having nowhere to store the OS thread context in a way that can be resumed (Each thread has its own stack, but now it's stackful with all the concerns of sizing and growing. Or you copy the stack to the task but now have somehow to fixup stack pointers in places the runtime is unaware).

https://tokio.rs/blog/2020-04-preemption#a-note-on-blocking

> Tokio does not, and will not attempt to detect blocking tasks and automatically compensate

> This is the same technique used in Go and many others for preemption. If you don't add this, futures that don't yield can run forever, stalling the system.

You may be referring to this particular issue in Go https://github.com/golang/go/issues/10958 which I think was somewhat addresses a couple releases back.

> You are right that it is not strictly necessary, but in practice, it is so helpful as a guard against the yielding problem that it's ubiquitous.

This is honestly shocking to hear. I would think that if people had bugs in their programs they would want them to fail loudly so they can be fixed.

As someone else said, it is not, strictly speaking, a bug. If your server receives a request that requires very computationally expensive work, is it okay to delay every other request on that core? That's probably not okay, and it'll show in your latency distribution.

Folks would rather have every future time sliced so that other tasks get some CPU time in a ~fair way (after all, there is no concept of task priority in most runtime).

But you're right: it isn't required, and you could sprinkle every loop of your code with yielding statements. But knowing when to yield is impossible for a future. If nothing else is running, it shouldn't yield. If many things are running but the problem space of the future is small, it probably shouldn't yield either, etc.

You simply do not have the necessary information in your future to make an informed decision. You need some global entity to keep track of everything and either yield for you or tell you when you should yield. Tokio does the former, Glommio does the latter.

It gets even more complex when you add IO into the mix because you need to submit IO requests in a way that saturates the network/nvme drives/whatever. So if a future submits an IO request, it's probably advantageous to yield immediately afterward so that other futures may do so as well. That's how you maximize throughput. But as I said, that's a very hard problem to solve.

Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

I guess if someone wants to use futures as if they were goroutines then it's not a bug, but this sort of presupposes that an opinionated runtime is already shooting signals at itself. Fundamentally the language gives you a primitive for switching execution between one context and another, and the premise of the program is probably that execution will switch back pretty quickly from work related to any single task.

I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling. The described problem cannot even happen in the "runtime" I'm currently using because io_uring won't just completely stop responding to other kinds of sqe's and only give you responses to a multishot accept when a lot of connections are coming in. I strongly suspect equivalent results are achievable with epoll.

There's nothing buggy about a future that never yields because it can always make progress, but people prefer that a runtime doesn't let all other execution get starved by one operation. That makes it a problem that runtimes and schedulers work to solve, but not a bug that needs to be prevented at a language level. A runtime that doesn't solve it isn't buggy, but probably isn't friendly to use, like how Go used to have problems with tight loops and they put in changes to make them cause less starvation.
> because stackfull coroutine requires a runtime that preempts coroutines

I've used stackful coroutines many times in many codebases. It never required or used a runtime or preemption. I'm not sure why having a runtime that preempts them would even be useful, since it defeats the reason most people use stackful coroutines in the first place.

"stackful coroutines" the control-flow primitive is cumbersome to build on top of "green threads" but for use cases that are mostly about blocking on lots of distinct I/O calls at the same time people may be indifferent between these two things. These conversations are often muddled because the feature shipped most often is called "async" and not called "jump to another stack please" :(
> I've used stackful coroutines many times in many codebases. It never required or used a runtime or preemption.

Can you tell us which? Go, Haskell and the other usual suspect all have runtime with automatic, transparent preemption.

It was always C++ for some type of high-performance data processing engine. Around half the stackful coroutine implementations were off-the-shelf libraries (e.g. Boost::Context) and the other half were purpose-built from scratch, depending on the feature requirements. The typical model is that you have stackful coroutines at a coarse level, e.g. per database query, which may dispatch hundreds of concurrent state machines. All execution and I/O scheduling is explicitly done by the software, which enables some significant runtime optimizations.

If coroutines can be preempted then it introduces a requirement for concurrency control that otherwise doesn't need to exist and interferes with dynamic cache locality optimizations. These are some of the primary benefits of using stackful coroutines in this context.

Being able to interrupt a stackful coroutine has utility for dealing with an extremely slow or stuck thread but you want this to be zero-overhead unless the thread is actually stuck. In most system designs, the time required to traverse any pair of sequential yield points is well-bounded so things getting "stuck" is usually a bug.

Letting end-users inject arbitrary code into these paths at runtime does require the ability to interrupt the thread but even that is often handled explicitly by more nuanced means than random preemption. Sometimes "extremely slow" is correct and expected behavior, so you have to schedule around it.

Lua comes with this sort of thing. OCaml, Python, and C have libraries providing this sort of thing in decreasing order of adoption.

Python also comes with 2 features that seem to be stackless coroutines with attached syntax ceremonies, but one of those 2 features is commonly used with a hefty runtime instead of being used for control flow. JavaScript comes with 2 features named similarly to those of Python, but only one of them seems to be "runtime-free" stackless coroutines.

The reason Rust chose stackless coroutines is because it allows zero cost FFI, which for a systems language is extremely important.
> Yes, async is effectively a much harder version of Rust, and it's regrettable how it's been shoved down the throats of everyone, while only 1% of projects using it really need it.

Yes. I just noticed that Tokio was pulled into my program as a dependency. Again. It's not being used, but I'm using a crate which has a function I'm not using which imports reqwest, which imports h2, which imports tokio.

Exactly, because something somewhere needs to make one http call, and it's would be impossible if it wasn't done with scalable async executor. /i
PR them to use ureq. ;)
I recently did this in a relatively small crate, and it halved the dependencies. Highly recommended if you don't need async.
Is there any reason to use async when your platform supports virtual threads?

I ask as someone who uses java and is about to rewrite a bunch of code to be able to chuck the entire async paradigm into the trash can and use a blocking model but on virtual threads where blocking is ok.

Virtual threads or green threads, etc., are all names for the same thing: stackful coroutines. I would say yes! If your language/platform/runtime supports them, that should definitely be your starting point.
> that should definitely be your starting point.

Could you expand a bit? Why?

Not OP, but synchronous code is much, much easier to understand and write than asynchronous code. What Java is doing is making synchronous code have all the advantages of asynchronous code by making blocking a Thread become a cheap operation (instead of blocking a real OS Thread), making the whole benfit of async code go away while getting rid of async's difficulties, specially in a language that doesn't have async/await (which makes async code "look" synchronous - but in Rust, as this blog post shows, that is not really the case).
Hot off the presses from the JVM Language Summit a few weeks ago; The Challenges of Introducing Virtual Threads to the Java Platform [1]

[1] https://www.youtube.com/watch?v=WsCJYQDPrrE