Hacker News new | ask | show | jobs
by zaphar 1477 days ago
Anything using the green/lightweight or OS thread model is usually easier to use at the cost of some runtime performance. Whether the runtime performance matters for your use case can only be determined by measuring stuff.

The perception that async rust is where you should start for concurrent rust because it's built in and everyone uses it perhaps should be revisited. I would argue that the other options are worth consideration first and dropping down to low level async code might be warranted when you need the performance it gives and that justifies the increase in development costs.

6 comments

Rust used to have green threads before 1.0 (libgreen). Early Rust was meant to be more like Erlang[1]. The problem with them wasn't only the overhead, but also interoperability and how they affect every interaction of the language with the OS and other libraries. It made the whole language dependent on its own custom runtime.

Rust isn't meant to be a language for CRUD apps (despite making inroads in this space). It's meant to be a C/C++ alternative that can work every difficult niche where these two can, including processes that already have their own runtimes, kernel space, microcontrollers, and other situations where any overhead or bringing custom threads with magic I/O and special stack handling is unacceptable.

Rust's async is designed to be separate from the core language, and work on top of arbitrary runtimes. Most people use tokio, but it can also work with your custom loop on microcontrollers, or on top of another runtime, e.g. WASM + browser's event loop, or gtk-rs that can work on top of GTK's event loop.

[1]: http://venge.net/graydon/talks/intro-talk-2.pdf

I'm aware of the history there. I think the decision not to ship a builtin async runtime was probably correct. I also think shipping async syntax sugar and allowing people to build their own custom runtimes is just fine.

I just think that the cultural decision in the wider ecosystem to make, practically speaking, everything io related, async is possibly a mistake.

Well I think it happened because a large number of Rust committers, core-devs doubled down on multi-year Rust async effort. What larger ecosystem would take away from this?

IMO the message was Async is the future so everyone better hop on this train.

I didn't get that message at all. The length of time it took to add async sugar made sense given what they were trying to do. It was not a statement regarding the suitability of it for every use case not should it have been.
the problem with async is it makes easy things much more complex if you don't need the performance. granted it should be easy for library/API designers to provide sync versions of all async calls, but I don't know if this happens.
Too many major packages in the ecosystem only support an async model now. It's pretty frustrating if you are just writing a synchronous program, or one with a straightforward OS threading model.
If your program is mostly synchronous, you can manually create the async runtime and just use block_on to call async functions from a sync context: https://tokio.rs/tokio/topics/bridging#a-synchronous-interfa...
Even simpler to use `futures::executor::block_on`. No need to create a runtime, you can just call the function.

https://docs.rs/futures/latest/futures/executor/fn.block_on....

That will only allow to run futures which have no IO dependency. Other typically expect a certain runtime to be running, because they eg use the epoll loop of that runtime to make progress.
No, this not work well.

The highly infectious nature of async means you need to do that A LOT.

ie: reverse ALL things await.

That is too much. I refactor all my codebase (a huge refactor!) because this.

What would you prefer the alternative to be? Library authors to do dual implementations of everything?
A language with a function-color-agnostic effect system, generic over asynchronicity?
This "async virality" syndrome is the main reason why async is harmful imho. _Some_ async can be very useful in certain constrained circumstances, I believe. However forcing the async execution model on all code is a terrible idea.
Yes. I've been saying this for some time. I call it "async contamination".

The async model assumes you spend most of your time waiting for your slow users to do something. (Why a web site, which is inherently stateless, should be doing that routinely is another issue.) I'm writing a metaverse client that has about 10-20 threads, many of them compute bound, running at different priorities. Works fine, but is totally different from the async model. Trying to keep async out of the networking has been difficult. I don't use "hyper" any more. I look at builds to see if "tokio" somehow got pulled in.

> Why a web site, which is inherently stateless, should be doing that routinely is another issue.

Because most web sites that would be doing this are not stateless? Any dynamic site will need to access a database, which means that the will be IO blocking, which means that given enough traffic the server will run out of available threads before being able to service the IO operations for all of these users. And because different parts of the website will likely have different DB load, you could easily cause a DoS by hitting an expensive endpoint repeatedly.

Sorry, offtopic, but what do you mean by "metaverse client"? I've seen you mention this in a couple comments now and I'm intrigued. I don't imagine you mean something to do with Facebook, right?
A metaverse client is the program you run on your machine to talk to a metaverse server. There are several clients for Second Life, a client for VRchat, a client for SineSpace, and so forth. There are web-based clients running in a web browser in WebAssembly, such as the one for Decentraland. All of these are 3D graphics programs.

They're halfway between MMO game clients and web browsers. They have to do most of the things a game client does, but they don't have built-in assets or game logic. Rather than a giant download at install (the biggest AAA titles have passed 100GB), all content is coming from the servers as needed, as with a web browser. The client's job is to present a good-looking 3D world while busily downloading content as the user moves round the world. Hopefully before the user gets close enough to see it in detail. So they have the performance problems of a 3D game with the content-handling problems of a web browser.

An existing open source metaverse client is Firestorm, a viewer for Second Life and Open Simulator.[1] Here's the source code.[2] It's mostly single-thread and OpenGL based. I've made some small contributions to that.

I am working on a replacement, in Rust, with more concurrency. About 20-30 threads, not thousands. Thread priority matters. Top priority is refresh, keeping the frame rate up. Next is servicing the network and user inputs. Then comes content decompression and preprocessing for adding to the scene. Much of this is compute-bound. Rust is a huge help in keeping the concurrency straight. This would be a much harder job in C++.

As the metaverse moves from hype to implementation, this will be a bigger area of activity. Right now, it's a niche.

[1] https://www.firestormviewer.org/

[2] https://vcs.firestormviewer.org/phoenix-firestorm

I've been calling it cancer, but I get down voted for that.
A great example of this would be in javascript testing frameworks. There must be dozens of frontend test frameworks that shoehorn inherently synchronous, procedural tasks into awkward syntax of sugared promise chains.
How would you propose mixing async and sync code from an implementation perspective?
I'll use an embedded analogy. I'm not as familiar with concurrency on GPOS, but consider this:

I have an I/O task that might take long, compared to CPU operations:

  - Start the task, but don't wait for its result.
  - Your program continues as normal
  - When the IO task is complete, its hardware sends an interrupt (at a specific priority) to the CPU. The CPU stops what it's doing (assuming there isn't a higher priority task in progress). Here, you can read the now-ready IO data, and do something with it. Or maybe cue another task.
You could also examine the case of DMA. Ie, your peripheral (Maybe your network chip in the case of a desktop PC?) commands an IO task. It runs in the background on your network hardware. You then read from, or write to the buffer that's associated with the DMA transfer as required. (Sometimes using DMA-related interrupts)

Could you apply this model to GPOS networking? Of note, some people are trying to do the opposite: Use Async on embedded, to wrap interrupts and DMA.

I have no idea what GPOS stands for, but the analogy isn't really necessary.

The high level algorithm you describe is basically how async programs work. Glossing over the low level details, you usually implement things in terms of polling. Interrupts and their analogs are far too slow at scale (switching async tasks is in the nanoseconds, these days).

The problem is when there is logic downstream of the task that needs its results and mixed with the results of some synchronous code in between. This is the "function coloring" problem.

Async semantics are designed to insert the logic for handling this (merging of async task results) seamlessly. There are two issues with this, the first is that synchronous code has no way of knowing what to do with asynchronous results (meaningfully), and the second that there has to exist some executor program that handles the merging and scheduling logic.

The thing that makes async "hard" in a language like Rust is that dealing with this problem is extremely difficult when you have no GC, lifetimes, call-by-move, closures that capture by move, and ownership semantics - it makes it verbose to write sound, non-trivial async code. For example, you're forced to introduce the notion of "pinned" data in memory to prevent it from being moved while tasks are switched. Lifetimes become a lot less clear. "Async destructors" don't really exist (what other languages would call finalizers that don't run at the end of lexical scope).

As for the mixing of sync/async code, that's not actually an issue if everything is async. It's trivial to write an executor that makes async calls blocking anyway.

What you are talking about has been resolved since there is a system with interrupts and an OS with a scheduler. It's nothing new. Async Rust (or whatever async) is just an autogenerated state machine that for years has been being implemented by hand.

> you usually implement things in terms of polling

Like a busy loop? If so, this approach is the worst, IMO. When done in an OS you keep your application thread always alive. In an embedded system it drains your battery.

The most common model is that in which an application should be sleeping all the time, and processing external stimulus when required.

> Interrupts and their analogs are far too slow at scale (switching async tasks is in the nanoseconds, these days).

Are you saying that whatever async system (like async Rust) is faster than a HW interrupt?

HW interrupts is what makes your system responsive as it is now. Stopping and jumping to an interrupt handler is hardwired in the silicon. What is faster than that?

I think GPOS in this context stands for General-Purpose OS (as opposed to embedded).
There's a neat crate for that I recently found: https://crates.io/crates/pollster
Thanks, that looks great.
I started writing rust ~6mo ago and while I agree with your sentiment, the issue I've run into is that so many packages I need to use, because there isn't an alternative and I don't want to build it myself, already uses async. I then have to either heavily wall off that part of my code or at a certain threshold realize I may have to adopt async myself because keeping two concurrency models going is really a lot of overhead.

It's hard to wind down that existing momentum.

Async has really taken over anything networking-related because, well, it offers much better scaling and performance. If you're a package author you're going to get more people asking for async than people that don't want it. There is no sane way to make async optional in a library and reuse code.
> it offers much better scaling and performance

Myth. Performance won't be better. Scaling arguably is better, but usually the use-case doesn't require the level of scaling where async is superior to OS threads.

I suspect you might be arguing semantics but in practice for certain types of applications performance will in all likelihood offer better performance. Scale and performance are linked when scaling up when you start to hit limits async can make it easier to get more out of your compute than otherwise which is a performance consideration. Calling his statement a myth ignores the context it was made in.
The point of the parent was that better performance is not guaranteed, and it's totally true.

E.g. go ahead and implement a RPC server which e.g. only has to deal with 10 concurrent requests - then measure latencies. The synchronous version might be faster, due to not requiring any epoll calls. The different might get even bigger if e.g. the server is serving static files, and you are measuring throughput - the synchronous version will likely provide higher performance since no extra context-switch from the async-runtime-of-your-choice to threadpool-for-file-io thread and back is required.

You are also right in that once one moves beyond a certain scale the async version might offer better performance. But the scale that is required would be different per application, and not every application requires the scale.

You will absolutely get "more" performance out of async. I'm not sure I could call it much more. It's hard to get an exact number because there isn't exactly a whole lot of pairs of "async" vs "greenthreaded" options out there, but I'd guess you're looking at 20%-30% tops. For most people, and even most people writing async code, this is irrelevant. They are never going to write code that absolutely needs that last 20-30% and that alone is the difference between the problem being solved and not solved.

It certainly isn't like you use a green thread model and you unconditionally throw away a 5x performance factor or something.

There are absolutely cases where that does matter. To name just one, a game engine would not want to throw away that level of performance out of the box. (That's the game engine user's job, to "spend" the quality of the game engine on their task.) But I think there's a lot more programmers who have, without analysis, assumed they're in that class and made a lot of decisions based on that, when in fact they are plural orders of magnitude away from it. To pick a number out thin air, 4 full CPU cores running Rust code that someone has at least glanced at and spent a bit of time optimizing is a loooooot of power.

(The closest current comparison is Rust vs. Go, but Rust works much harder at compile-time optimization and doesn't have GC, and I expect those two things account for the majority of the delta between them, with Go being greenthreaded being non-trivial, but in the clear minority. Stay tuned for Java with Project Loom versus Rust, which has its own rather major differences but will at least be another relevant data point.)

https://github.com/jimblandy/context-switch/ suggests that it's not substantially better
Interesting, but there are other issues. A big one is resource exhaustion attacks. A thread per connection means that someone can trivially exhaust system memory, while async pseudo-threads (tiny bits of state) take up virtually no space.

Edit: also this only tests 500, not 500000.

Also when doing threaded I/O as soon as you want to support bidirectional traffic you will have to implement select/poll/etc. since you can't do a blocking read and a blocking write at the same time on one thread. At that point you're already giving up a lot of the advantages of threads.

> There is no sane way to make async optional in a library and reuse code.

FWIW, there's an effort to do exactly that, but because it will require language level changes and it is just on the drawing board phase, it will likely be a while before it can be widely used.

The "optionality" of `async` while sharing code also applies for `const` and mutability (why do we need `Deref` and `DerefMut`?). Finding a solution that can work for these three (and maybe others?) parts of the language will be a welcome improvement.

Great to hear! That's really the solution.

Rust async code can be a bit challenging until you get it, but I can't think of a way to make it that much simpler without sacrificing the whole "systems programming language" concept or support for embedded. The only good alternative is Go-like fibers and that requires a fat runtime.

We use both Rust and Go at ZeroTier and find that they both have their own niches. (We are slowly moving ZeroTier from C++ to Rust to use a more modern and more importantly safe language.)

Where do you use Go?
Backend to my.zerotier.com and internal analytics code.
> FWIW, there's an effort to do exactly that

Could you link where?

Personally, once I grokked async rust, I found it much easier to use and reason about than threads. Things just seem to map better without any messy stuff to think about.
Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development. I'd rather have faster development most times. It's why I prefer to use Go over Rust whenever possible. That's why I'm really interested in what lunatic is doing here. It might narrow the gap a little.
Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development.

Nodejs devs seem to be doing fine? and I would say their development is faster than most devs working on other stacks. Nodejs is also a top 3 server stack and growing.

NodeJS doesn't have all these gotchas around async that rust does. It's still harder than sync code, but it's more manageable.

The lack of a proper type system, only partially solved by typescript is a big drawback though, that eats into productivity.

Imo, 99% of the time, ergonomics should take precedence over power. Power can always be added later with clever hacks, without ruining an ergonomic interface. But adding ergonomics to power is a much more broken process.
> Power can always be added later with clever hacks, without ruining an ergonomic interface.

This puts limits on what can be accomplished. Starting with a more restricted set of code allowed, and then expanding it over time can be more successful in many cases, without locking you into a perhaps more ergonomic looking interface that needs to be coddled with no tooling support to avoid the "slow path". For examples in Rust: `impl Trait` used not to exist, which meant you had to use `Box<dyn Trait>` instead, which can be slower and certainly ads some verbosity. Then `impl Trait` was added and a bunch of code was now representable, and soon `type Alias = impl Trait;` will be stabilized which will allow even more code to be representable, in a way that is both performant and easier to use. A language that instead says "just use `-> Trait` and the compiler will figure out what to do" would have increased the user's perf without intervention, but for anyone that really cares about FFI stability or wants to keep on top of heap allocations would be out in the cold.

It is the same reason that you can complain about the complexity of the String/&str distinction in Rust[1], but avoiding lingering references to big strings in JS (effectively a memory leak) becomes much harder.

[1]: https://fasterthanli.me/articles/working-with-strings-in-rus...

That's a reasonable choice of priorities to have, but it's the opposite of Rust's. Rust prioritizes (1) safety, (2) performance, (3) ergonomics, in that order. There are other languages that make put ergonomics before performance but they are generally unsuitable for Rust's niche.