Rust without the async (hard) part | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Rust without the async (hard) part (lunatic.solutions)
	149 points by _5blv 1470 days ago

16 comments

Matthias247 1469 days ago

> The problem is that threads just don’t work in practice for massive concurrency.

That's an assumption that is repeated very often recently, and measured very rarely. Truth is that they amount of applications for which they don't work is surprisingly low. I'm working at a well known cloud provider, and lots of people would really be suprised which applications at largest scale are working fine with a thread-per-request model. 50k OS threads are not really an issue on modern server hardware. While it might not be the most efficient [1], it will not perform so bad that it causes an availaiblity impact either.

There's obviously some exceptions to that [2] - but I encourage people to measure instead of making assumptions. Unless one finds themselves in a weekly meeting about server efficiency or scaling cliffs both models probably work.

[1] it really depends on the workload, but people might find an efficiency degradation (e.g. measured as BYTES_TRANSFERRED/CPU_CORES_USED) of 20% at a concurrency level of 1000, or maybe only at a concurrency level of 10k. Coarse-grained work items (e.g. send a large file to a socket) will show a lower degradation.

[2] Load balancers, CDN services, and e.g. chat applications which maintain a massive amount of mostly idle client connections can be such environments. They have a high amount of concurrency that needs to be managed, but less so of "active concurrency". If all clients would be active at the same time, those environments would run out of disk IO or network bandwidth far before CPU or memory become an issue.

gopalv 1469 days ago

> While it might not be the most efficient, it will not perform so bad that it causes an availaiblity impact either.

Performance is important, but the biggest performance gain happens when a program goes from not working to working correctly.

Debugging is another corner case which async makes it intolerably hard to get backtrace and make sense out of what is going on.

It's not like debugging threads is easy, but in a low contention environment which is entirely "1 thread holds state of one request" and there are few interlocking threads in it, threading is a fair bit better than async execution. Plus the logs which indicate thread-names make it possible to draw out something like a post-processed Catapult timing diagram (open chrome://tracing and look at an example, it is a great UI for dropping in your own multi-threaded event log as JSON).

I'm a big fan of executor thread-groups and work queues, but damn does it make hard to mentally walk through a bug when the stack traces are scattered across multiple places.

bsder 1469 days ago

> > The problem is that threads just don’t work in practice for massive concurrency.

> That's an assumption that is repeated very often recently, and measured very rarely.

I would go further--there is a whole infrastructure that needs to appear when massive concurrency is involved and very few times is that taken into account.

For those people interested in genuine massive concurrency, I encourage people to investigate Erlang. In my opinion, the language itself is just "meh", but OTP, the infrastructure around managing, upgrading, restarting, etc. processes/threads, is extremely on point.

Side note: Erlang still has the absolute best handling of binary parsing of any language ever. https://www.erlang.org/doc/programming_examples/bit_syntax.h...

I really wish the Rust people would pick something like the Erlang Bit Sytax up and integrate it with their pattern matching (probably necessitating some pattern matching language fixes) rather than the amount of effort they continue to piddle on async/await.

rad_gruchalski 1469 days ago

Erlang pattern matching is awesome. Matching on binaries makes it very easy to parse protocols.

Re concurrency. I learned Erlang before Akka. It took me a bit but I find Akka more ergonomic. Akka will easily handle millions of actors on a single machine, too. But I always miss matching on binaries.

Another good one is protoactor for golang. That will also do a million actors no problem. Comes really close to Erlang in terms of how concise the syntax is. But again, no binary matching.

woah 1469 days ago

Why would you assume that all software is written for servers in datacenters? Rust tends to be used in embedded devices, WASM, and other weird contexts where there might not be as many resources available.

If you're writing a CRUD app, sure, do it in PHP and spin up a thread per request.

rat9988 1469 days ago

Because he is talking about massive concurrency, not embedded or wasm or other contexts where there not be as many resources available.

rad_gruchalski 1469 days ago

Since when is 50k threads massive?

int_19h 1469 days ago

Embedded is much less likely to need async in the first place at all.

woah 1469 days ago

Having written wifi router firmware in rust, I would disagree

hgomersall 1469 days ago

I use async extensively in my embedded application. The design is a joy to extend and maintain.

baq 1469 days ago

cooperative multitasking sounds way more embedded than preemptive...

Matthias247 1469 days ago

Not necessarily. A lot of embedded projects use realtime operating systems (RTOS). And those make use of preemptive schedulers in order to actually provide realtime guarantees.

There's obviously also some projects which just use a bare-metal loop to do everything - that probably counts as cooperative.

eklitzke 1469 days ago

I agree and this article seems pretty misinformed. Creating and managing threads on Linux is extremely cheap, especially when a lot of them are idle, and a lot of big companies (Google, Facebook, Amazon) have tons of huge C++ applications that have thousands of threads and it's fine. I also think a lot of people who don't work on these problems at these kinds of companies assume that it must be incredibly difficult to write code like this and debug it, but that's not really true. For one thing, generally the tricky parts to write are abstracted away so that regular engineers don't have to think much about threading concurrency issues. And when they come up, tsan and lock annotations[1] will catch 99.9% of these problems in testing and make it easy to understand why things are breaking.

In the real world here are the kinds of problems that people at Google etc. care about when it comes to performance or scalability issues with hugely concurrent programs:

  - Noisy neighbor problems from other threads messing with your TLB and L1 cache
  - High cost of context switches
  - Unpredictable scheduling/priority inversion in the scheduler

The first problem isn't actually made any better by using async coroutines or green threads/fibers, if you switch to another coroutine or fiber and it does something naughty (e.g. munmaps memory, which will cause a TLB shootdown) it's going to degrade performance for your unrelated coroutine/fiber.

The second and third problems can be solved in some cases by things like fibers and userspace scheduling, but this is a fairly advanced topic and "just use async" is definitely not the solution. If you're interested in learning more about how these problems are actually solved at Google for example I recommend [2] and [3].

[1] https://abseil.io/docs/cpp/guides/synchronization#thread-ann... [2] https://www.youtube.com/watch?v=KXuZi9aeGTw [3] https://storage.googleapis.com/pub-tools-public-publication-...

ibraheemdev 1469 days ago

> - Noisy neighbor problems from other threads messing with your TLB and L1 cache

Switching between threads within the same process doesn't require a TLB or L1 cache flush. Not sure if you were implying this, just wanted to point that out.

> - High cost of context switches

Userspace schedulers (like rust's tokio) do make context switching cheaper, however, most of the context switching in the case of a web server is due to blocking I/O and the most expensive part of the switch, entering the kernel, is already accounted for by the I/O request. Kernel context switching is unlikely to be your bottleneck.

> Unpredictable scheduling/priority inversion in the scheduler

This can definitely be an issue at scale, but a general purpose async scheduler like most use is unlikely to be any better.

rstuart4133 1469 days ago

As another data point, I have one Firefox window right now:

    $ ps -eLf | grep firefox | wc -l
    569
    $

geodel 1469 days ago

Once one go with cultish following of async everything idea, measuring things would be heresy.

zaphar 1470 days ago

Anything using the green/lightweight or OS thread model is usually easier to use at the cost of some runtime performance. Whether the runtime performance matters for your use case can only be determined by measuring stuff.

The perception that async rust is where you should start for concurrent rust because it's built in and everyone uses it perhaps should be revisited. I would argue that the other options are worth consideration first and dropping down to low level async code might be warranted when you need the performance it gives and that justifies the increase in development costs.

pornel 1470 days ago

Rust used to have green threads before 1.0 (libgreen). Early Rust was meant to be more like Erlang[1]. The problem with them wasn't only the overhead, but also interoperability and how they affect every interaction of the language with the OS and other libraries. It made the whole language dependent on its own custom runtime.

Rust isn't meant to be a language for CRUD apps (despite making inroads in this space). It's meant to be a C/C++ alternative that can work every difficult niche where these two can, including processes that already have their own runtimes, kernel space, microcontrollers, and other situations where any overhead or bringing custom threads with magic I/O and special stack handling is unacceptable.

Rust's async is designed to be separate from the core language, and work on top of arbitrary runtimes. Most people use tokio, but it can also work with your custom loop on microcontrollers, or on top of another runtime, e.g. WASM + browser's event loop, or gtk-rs that can work on top of GTK's event loop.

[1]: http://venge.net/graydon/talks/intro-talk-2.pdf

zaphar 1470 days ago

I'm aware of the history there. I think the decision not to ship a builtin async runtime was probably correct. I also think shipping async syntax sugar and allowing people to build their own custom runtimes is just fine.

I just think that the cultural decision in the wider ecosystem to make, practically speaking, everything io related, async is possibly a mistake.

geodel 1470 days ago

Well I think it happened because a large number of Rust committers, core-devs doubled down on multi-year Rust async effort. What larger ecosystem would take away from this?

IMO the message was Async is the future so everyone better hop on this train.

zaphar 1469 days ago

I didn't get that message at all. The length of time it took to add async sugar made sense given what they were trying to do. It was not a statement regarding the suitability of it for every use case not should it have been.

baq 1469 days ago

the problem with async is it makes easy things much more complex if you don't need the performance. granted it should be easy for library/API designers to provide sync versions of all async calls, but I don't know if this happens.

loeg 1470 days ago

Too many major packages in the ecosystem only support an async model now. It's pretty frustrating if you are just writing a synchronous program, or one with a straightforward OS threading model.

kirbyfan64sos 1470 days ago

If your program is mostly synchronous, you can manually create the async runtime and just use block_on to call async functions from a sync context: https://tokio.rs/tokio/topics/bridging#a-synchronous-interfa...

abiro 1470 days ago

Even simpler to use `futures::executor::block_on`. No need to create a runtime, you can just call the function.

https://docs.rs/futures/latest/futures/executor/fn.block_on....

Matthias247 1470 days ago

That will only allow to run futures which have no IO dependency. Other typically expect a certain runtime to be running, because they eg use the epoll loop of that runtime to make progress.

mamcx 1470 days ago

No, this not work well.

The highly infectious nature of async means you need to do that A LOT.

ie: reverse ALL things await.

That is too much. I refactor all my codebase (a huge refactor!) because this.

jen20 1470 days ago

What would you prefer the alternative to be? Library authors to do dual implementations of everything?

slightknack 1469 days ago

A language with a function-color-agnostic effect system, generic over asynchronicity?

dboreham 1470 days ago

This "async virality" syndrome is the main reason why async is harmful imho. _Some_ async can be very useful in certain constrained circumstances, I believe. However forcing the async execution model on all code is a terrible idea.

Animats 1470 days ago

Yes. I've been saying this for some time. I call it "async contamination".

The async model assumes you spend most of your time waiting for your slow users to do something. (Why a web site, which is inherently stateless, should be doing that routinely is another issue.) I'm writing a metaverse client that has about 10-20 threads, many of them compute bound, running at different priorities. Works fine, but is totally different from the async model. Trying to keep async out of the networking has been difficult. I don't use "hyper" any more. I look at builds to see if "tokio" somehow got pulled in.

estebank 1470 days ago

> Why a web site, which is inherently stateless, should be doing that routinely is another issue.

Because most web sites that would be doing this are not stateless? Any dynamic site will need to access a database, which means that the will be IO blocking, which means that given enough traffic the server will run out of available threads before being able to service the IO operations for all of these users. And because different parts of the website will likely have different DB load, you could easily cause a DoS by hitting an expensive endpoint repeatedly.

losvedir 1470 days ago

Sorry, offtopic, but what do you mean by "metaverse client"? I've seen you mention this in a couple comments now and I'm intrigued. I don't imagine you mean something to do with Facebook, right?

Animats 1469 days ago

A metaverse client is the program you run on your machine to talk to a metaverse server. There are several clients for Second Life, a client for VRchat, a client for SineSpace, and so forth. There are web-based clients running in a web browser in WebAssembly, such as the one for Decentraland. All of these are 3D graphics programs.

They're halfway between MMO game clients and web browsers. They have to do most of the things a game client does, but they don't have built-in assets or game logic. Rather than a giant download at install (the biggest AAA titles have passed 100GB), all content is coming from the servers as needed, as with a web browser. The client's job is to present a good-looking 3D world while busily downloading content as the user moves round the world. Hopefully before the user gets close enough to see it in detail. So they have the performance problems of a 3D game with the content-handling problems of a web browser.

An existing open source metaverse client is Firestorm, a viewer for Second Life and Open Simulator.[1] Here's the source code.[2] It's mostly single-thread and OpenGL based. I've made some small contributions to that.

I am working on a replacement, in Rust, with more concurrency. About 20-30 threads, not thousands. Thread priority matters. Top priority is refresh, keeping the frame rate up. Next is servicing the network and user inputs. Then comes content decompression and preprocessing for adding to the scene. Much of this is compute-bound. Rust is a huge help in keeping the concurrency straight. This would be a much harder job in C++.

As the metaverse moves from hype to implementation, this will be a bigger area of activity. Right now, it's a niche.

[1] https://www.firestormviewer.org/

[2] https://vcs.firestormviewer.org/phoenix-firestorm

accountofme 1467 days ago

I've been calling it cancer, but I get down voted for that.

scoopdewoop 1470 days ago

A great example of this would be in javascript testing frameworks. There must be dozens of frontend test frameworks that shoehorn inherently synchronous, procedural tasks into awkward syntax of sugared promise chains.

duped 1470 days ago

How would you propose mixing async and sync code from an implementation perspective?

the__alchemist 1470 days ago

I'll use an embedded analogy. I'm not as familiar with concurrency on GPOS, but consider this:

I have an I/O task that might take long, compared to CPU operations:

  - Start the task, but don't wait for its result.
  - Your program continues as normal
  - When the IO task is complete, its hardware sends an interrupt (at a specific priority) to the CPU. The CPU stops what it's doing (assuming there isn't a higher priority task in progress). Here, you can read the now-ready IO data, and do something with it. Or maybe cue another task.

You could also examine the case of DMA. Ie, your peripheral (Maybe your network chip in the case of a desktop PC?) commands an IO task. It runs in the background on your network hardware. You then read from, or write to the buffer that's associated with the DMA transfer as required. (Sometimes using DMA-related interrupts)

Could you apply this model to GPOS networking? Of note, some people are trying to do the opposite: Use Async on embedded, to wrap interrupts and DMA.

duped 1470 days ago

I have no idea what GPOS stands for, but the analogy isn't really necessary.

The high level algorithm you describe is basically how async programs work. Glossing over the low level details, you usually implement things in terms of polling. Interrupts and their analogs are far too slow at scale (switching async tasks is in the nanoseconds, these days).

The problem is when there is logic downstream of the task that needs its results and mixed with the results of some synchronous code in between. This is the "function coloring" problem.

Async semantics are designed to insert the logic for handling this (merging of async task results) seamlessly. There are two issues with this, the first is that synchronous code has no way of knowing what to do with asynchronous results (meaningfully), and the second that there has to exist some executor program that handles the merging and scheduling logic.

The thing that makes async "hard" in a language like Rust is that dealing with this problem is extremely difficult when you have no GC, lifetimes, call-by-move, closures that capture by move, and ownership semantics - it makes it verbose to write sound, non-trivial async code. For example, you're forced to introduce the notion of "pinned" data in memory to prevent it from being moved while tasks are switched. Lifetimes become a lot less clear. "Async destructors" don't really exist (what other languages would call finalizers that don't run at the end of lexical scope).

As for the mixing of sync/async code, that's not actually an issue if everything is async. It's trivial to write an executor that makes async calls blocking anyway.

U1F984 1470 days ago

There's a neat crate for that I recently found: https://crates.io/crates/pollster

loeg 1470 days ago

Thanks, that looks great.

ianbutler 1470 days ago

I started writing rust ~6mo ago and while I agree with your sentiment, the issue I've run into is that so many packages I need to use, because there isn't an alternative and I don't want to build it myself, already uses async. I then have to either heavily wall off that part of my code or at a certain threshold realize I may have to adopt async myself because keeping two concurrency models going is really a lot of overhead.

It's hard to wind down that existing momentum.

api 1470 days ago

Async has really taken over anything networking-related because, well, it offers much better scaling and performance. If you're a package author you're going to get more people asking for async than people that don't want it. There is no sane way to make async optional in a library and reuse code.

dboreham 1470 days ago

> it offers much better scaling and performance

Myth. Performance won't be better. Scaling arguably is better, but usually the use-case doesn't require the level of scaling where async is superior to OS threads.

zaphar 1470 days ago

I suspect you might be arguing semantics but in practice for certain types of applications performance will in all likelihood offer better performance. Scale and performance are linked when scaling up when you start to hit limits async can make it easier to get more out of your compute than otherwise which is a performance consideration. Calling his statement a myth ignores the context it was made in.

Matthias247 1469 days ago

The point of the parent was that better performance is not guaranteed, and it's totally true.

E.g. go ahead and implement a RPC server which e.g. only has to deal with 10 concurrent requests - then measure latencies. The synchronous version might be faster, due to not requiring any epoll calls. The different might get even bigger if e.g. the server is serving static files, and you are measuring throughput - the synchronous version will likely provide higher performance since no extra context-switch from the async-runtime-of-your-choice to threadpool-for-file-io thread and back is required.

You are also right in that once one moves beyond a certain scale the async version might offer better performance. But the scale that is required would be different per application, and not every application requires the scale.

jerf 1469 days ago

You will absolutely get "more" performance out of async. I'm not sure I could call it much more. It's hard to get an exact number because there isn't exactly a whole lot of pairs of "async" vs "greenthreaded" options out there, but I'd guess you're looking at 20%-30% tops. For most people, and even most people writing async code, this is irrelevant. They are never going to write code that absolutely needs that last 20-30% and that alone is the difference between the problem being solved and not solved.

It certainly isn't like you use a green thread model and you unconditionally throw away a 5x performance factor or something.

There are absolutely cases where that does matter. To name just one, a game engine would not want to throw away that level of performance out of the box. (That's the game engine user's job, to "spend" the quality of the game engine on their task.) But I think there's a lot more programmers who have, without analysis, assumed they're in that class and made a lot of decisions based on that, when in fact they are plural orders of magnitude away from it. To pick a number out thin air, 4 full CPU cores running Rust code that someone has at least glanced at and spent a bit of time optimizing is a loooooot of power.

(The closest current comparison is Rust vs. Go, but Rust works much harder at compile-time optimization and doesn't have GC, and I expect those two things account for the majority of the delta between them, with Go being greenthreaded being non-trivial, but in the clear minority. Stay tuned for Java with Project Loom versus Rust, which has its own rather major differences but will at least be another relevant data point.)

sealeck 1470 days ago

https://github.com/jimblandy/context-switch/ suggests that it's not substantially better

api 1470 days ago

Interesting, but there are other issues. A big one is resource exhaustion attacks. A thread per connection means that someone can trivially exhaust system memory, while async pseudo-threads (tiny bits of state) take up virtually no space.

Edit: also this only tests 500, not 500000.

Also when doing threaded I/O as soon as you want to support bidirectional traffic you will have to implement select/poll/etc. since you can't do a blocking read and a blocking write at the same time on one thread. At that point you're already giving up a lot of the advantages of threads.

estebank 1470 days ago

> There is no sane way to make async optional in a library and reuse code.

FWIW, there's an effort to do exactly that, but because it will require language level changes and it is just on the drawing board phase, it will likely be a while before it can be widely used.

The "optionality" of `async` while sharing code also applies for `const` and mutability (why do we need `Deref` and `DerefMut`?). Finding a solution that can work for these three (and maybe others?) parts of the language will be a welcome improvement.

api 1470 days ago

Great to hear! That's really the solution.

Rust async code can be a bit challenging until you get it, but I can't think of a way to make it that much simpler without sacrificing the whole "systems programming language" concept or support for embedded. The only good alternative is Go-like fibers and that requires a fat runtime.

We use both Rust and Go at ZeroTier and find that they both have their own niches. (We are slowly moving ZeroTier from C++ to Rust to use a more modern and more importantly safe language.)

mikevm 1470 days ago

Where do you use Go?

cercatrova 1469 days ago

> FWIW, there's an effort to do exactly that

Could you link where?

estebank 1469 days ago

https://hackmd.io/Aw2L3VmPQsm0ANC_XMughQ

hgomersall 1470 days ago

Personally, once I grokked async rust, I found it much easier to use and reason about than threads. Things just seem to map better without any messy stuff to think about.

eloff 1470 days ago

Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development. I'd rather have faster development most times. It's why I prefer to use Go over Rust whenever possible. That's why I'm really interested in what lunatic is doing here. It might narrow the gap a little.

ithrow 1470 days ago

Yes, async is hard. It adds lots of complexity, both to the code and in your mental model. That slows development.

Nodejs devs seem to be doing fine? and I would say their development is faster than most devs working on other stacks. Nodejs is also a top 3 server stack and growing.

eloff 1468 days ago

NodeJS doesn't have all these gotchas around async that rust does. It's still harder than sync code, but it's more manageable.

The lack of a proper type system, only partially solved by typescript is a big drawback though, that eats into productivity.

daenz 1470 days ago

Imo, 99% of the time, ergonomics should take precedence over power. Power can always be added later with clever hacks, without ruining an ergonomic interface. But adding ergonomics to power is a much more broken process.

estebank 1470 days ago

> Power can always be added later with clever hacks, without ruining an ergonomic interface.

This puts limits on what can be accomplished. Starting with a more restricted set of code allowed, and then expanding it over time can be more successful in many cases, without locking you into a perhaps more ergonomic looking interface that needs to be coddled with no tooling support to avoid the "slow path". For examples in Rust: `impl Trait` used not to exist, which meant you had to use `Box<dyn Trait>` instead, which can be slower and certainly ads some verbosity. Then `impl Trait` was added and a bunch of code was now representable, and soon `type Alias = impl Trait;` will be stabilized which will allow even more code to be representable, in a way that is both performant and easier to use. A language that instead says "just use `-> Trait` and the compiler will figure out what to do" would have increased the user's perf without intervention, but for anyone that really cares about FFI stability or wants to keep on top of heap allocations would be out in the cold.

It is the same reason that you can complain about the complexity of the String/&str distinction in Rust[1], but avoiding lingering references to big strings in JS (effectively a memory leak) becomes much harder.

[1]: https://fasterthanli.me/articles/working-with-strings-in-rus...

necubi 1469 days ago

That's a reasonable choice of priorities to have, but it's the opposite of Rust's. Rust prioritizes (1) safety, (2) performance, (3) ergonomics, in that order. There are other languages that make put ergonomics before performance but they are generally unsuitable for Rust's niche.

lewantmontreal 1470 days ago

I use Rust for the amazing types, map/filter/reduce, and, even if I never write macros myself, beautiful libraries like serde and clap. I do need to often use async to wait for multiple network requests at once, although I'm not quite comfortable with it.

Requesting urls n-at-a-time took me a while (https://play.rust-lang.org/?version=stable&mode=debug&editio...). In particular rust-analyzer itself cannot figure out `buffer`'s type here.

You can consider me very intrigued by Lunatic.

etra0 1470 days ago

Sometime ago I was comparing go, python and Rust to do some GET request asynchronous.

At first, I noticed that the go version was actually faster than the Rust one, and then I saw that in `reqwest`, they recommend you if you're doing multiple GET request, to create a `Client` and then use that to get better performance[1]. After changing my code, the Rust version was effectively a bit faster (not by much, to be honest, which was a bit disappointing considering go's version was way easier to write, and I say this as a generally rust shill).

Hopefully this comment is somewhat helpful :)

[1] https://docs.rs/reqwest/latest/reqwest/#making-a-get-request

Matthias247 1470 days ago

reqwest::get() is even worse than not having a connection pool. It will also reload the full content of system certificate stores on each invocation - since it creates a new reqwest client. On some hosts that can take 10-100ms alone.

Always create a client explicitly. And also always add a timeout.

The Go http.Get() function uses a shared global client, so making a request doesn't have high initialization costs, and requests can make sure of a shared connection pool.

Animats 1470 days ago

They recommend you if you're doing multiple GET request, to create a `Client` and then use that to get better performance

Right, then it doesn't have to reopen the connection for each request. That's not an async thing, it's a caching thing.

zaphar 1470 days ago

In heavily IO bound workloads for a compiled language like Rust and Go the bulk of the time will be spent waiting for IO. In that world the optimzations of the compiler for CPU bound operations will fade into the background so it's not suprising that Go is competitive with Rust for that kind of workload. If your workload is this type and Go is equally supported Rust then Go may be a better choice.

masklinn 1470 days ago

Python’s request is exactly the same (the client is called Session). I guess the go client just uses a global connection pool by default?

jen20 1470 days ago

Yes - `DefaultClient` in `net/http` is what the various package level methods operate on. This is constitutionally bad as global state that dependencies can mutate at will during init (or any other time), hence go-cleanhttp [1].

[1]: https://github.com/hashicorp/go-cleanhttp

wongarsu 1470 days ago

> However, if you are doing web apps or any networking stuff, massive concurrency benefits are almost always too important to ignore

My problem is more that even if I don't need massive concurrency (say in a client that only talks to a single server, in a serial manner), I'm still more or less forced into async code because that's what the ecosystem switched to. No matter if you benefit from async or not, not using it is going against the grain and generally makes your life harder, despite threads being much better from a language-ergonomics point of view

solar-ice 1470 days ago

As much as I agree, and it's a mess: you can very much use the tokio runtime's block_on function to do as little async as possible. Rust is in general a much nicer language, with lots of good tooling, when you pretend async stuff is blocking like that.

ruuda 1470 days ago

There are still good synchronous alternatives, e.g. tiny_http for serving, and just binding libcurl for requests, but I agree it is becoming harder to avoid async.

cshenton 1469 days ago

Why isn’t imperative event loop programming more widely used? It’s a reasonably common pattern for games networking libraries like Enet, and has the added bonus that you get to design exactly how you lay out the memory of all your in flight work and therefore have it be easily debuggable.

thecompilr 1469 days ago

For me async is about ergonomics first of all. When you perform parallel tasks on multiple threads it is hard and ugly (in cross platform Rust at least) to implement any sort of intricate cross communication, as communication between threads is asynchronous by nature. And it is very much impossible to stop a thread externally.

Async rust lets you implement different combinators on async tasks and cancel them effortlessly.

As for performance, tokio is not exactly a zero cost abstraction. Just run perf on a tokio program to see how big of overhead it introduces. It has claimed to be zero cost from the start, and since then it has done at least two major performance overhauls, to prove the point. That being said I love tokio and its ecosystem, but it is ergonomics, not speed that I love. That being said async-std was much slower for the networking use case that I had, so overall tokio is as good as it gets.

WindSoldier 1469 days ago

Why do you say async-std is so much slower? If there any result or report shows that? I'm also curious about the `perf` you mentioned.

thecompilr 1469 days ago

Well, at least it was slower a while ago, for the networking code I was working on. Could be outperforming for other use cases. As for perf, when you run it you can see that a lot of cpu is sent on work stealing and other bookkeeping. Which is fine, and the single threaded runtime doesn’t have some of that, so I use that a lot.

jstx1 1470 days ago

I've done some beginner Rust and Go programming (read "the books" on both, written small programs) and I'm wondering which one to spend more time on or try to get a job with in the future. When I see discussions like this one about Rust, I start to worry that it's unnecessarily complicated and difficult to work with and that this will only get worse in the future to the point that it won't be a good fit for many of the use cases that it's pitched for. Am I wrong to think this?

toolz 1470 days ago

If you're trying to break into the industry you're not going to be working on problems that the language really matters. Pick a popular language, learn enough to be dangerous and specialize once you find categories of problems that interest you. Go and rust are only compared a lot because they're sexy buzzwords - they do not target the same problems and they aren't competing languages. Learning both at some point could prove valuable, but personally I'd never recommend go to anyone for anything anyways.

jstx1 1470 days ago

> they do not target the same problems and they aren't competing languages.

Is this really true? All the problems that are solvable in Go should be solvable in Rust too right (but not vice versa because Go is GCed)? They might not compete on every front but there definitely should be overlap in the use cases.

toolz 1470 days ago

There is a ton of overlap with every general purpose programming language. So it's correct to say you can solve the vast majority of problems with the majority of programming languages, but languages differentiate themselves in many different ways. One of the main differences between rust and go is that go is a garbage collected language. That feature alone typically creates a large divide in what languages are trying to achieve.

jstx1 1470 days ago

I feel like you’re ignoring my point. The GC prevents Go from competing with Rust on some use cases. But that still leaves a lot of overlap of potential use cases and if your work falls in that region of use cases, you get to pick between the two languages (and a bunch of others too).

toolz 1469 days ago

I feel like I covered the fact that there's a large overlap between most languages, but to clarify more - when a software team chooses a language they aren't choosing a language based on the overlap, they are choosing it based on the language specific features they think will benefit them the most across the problems they most often try to solve. Go and rusts unique features do no compete with each other like rust/c++ would or like go/elixir (which I feel these two languages are much more comparable in the problems they focus on solving.

Go generally targets small microservices and can be quite limiting (imho) to larger projects because the language semantics are extremely simple and are not geared towards projects with numerous business domains.

Rust on the other hand has extremely powerful generics which enable sophisticated code sharing and composition to enable large projects. Rust also very purposefully targets embedded systems and low level systems programming. You can do these things in go, but it is not something the language is designed to do as a first class priority.

loudmax 1469 days ago

Go is very easy to learn. You can be up and running with Go very quickly and it's fantastic for simple applications. The time investment to be reasonably good at writing Go is low. There's little reason not to learn Go.

Rust is difficult to learn, unless you already have a lot of experience with existing low level languages. Getting complex programs up and running with Rust is cumbersome. But the performance is excellent, you can have a high degree that your program is rock solid, and there are entire classes of security issues don't happen in Rust. For the types of applications where Rust does well, it does very well indeed. The time investment to become a decent Rust programmer is high, but this higher barrier to entry can make your programming skills even more valuable since there's less supply to meet the demand.

hgomersall 1469 days ago

Rust is hard without a doubt. I'm suspicious it's really only hard because it front loads the effort over, say, c++, but it is hard. Expect maybe a year to be proficient, but there's a point long before that it becomes a delight.

Async is hard again, taking more months to feel proficient. I've again a suspicion that much of the resistance to async is due to people who have done the first effort to feel comfortable in rust and expect async to fit right in, but it doesn't, because it's hard too.

Threads are also hard, but under rust they better map to existing thread models, so pre-existing skills are useful and so someone skilled in threads and rust will be skilled in threads with rust.

For sure, there are missing pieces of the async world like async traits, but they will come.

verdagon 1470 days ago

Does anyone else get the feeling that we (as a field) are missing something basic about concurrency? Like there's a really elegant solution just around the corner, that has the low overhead of async/await without the complexity. Or otherwise put, the ease of goroutines but without GC.

I know it sounds crazy. I recently dove into the area, and was pretty surprised at how many interesting building blocks there are out there. It feels like if we just combine them in the right way, we'll discover something that works a lot better.

Off the top of my head:

Google discovered a way to switch between OS threads without the syscall overhead. All it needs is to solve the memory overhead. [0]

Zig discovered a way to use monomorphization to enable colorless async/await. If someone could figure out how to make it work through polymorphism / virtual dispatch, that would be amazing. [1]

Vale discovered a possible way to make structured concurrency in a memory safe way that's easier than existing methods. [2]

Go [3] and Loom [4] show us that we can move stacks around. Loom is particularly interesting as it shows we can move the stack to its original location, a unique mechanism that could solve some other approaches' problems with pointer invalidation.

Cone is designing a unique blend of actors and async await, to enable simpler architectures. [5]

We're close to solving the problem, I can feel it.

[0] No public docs on it, but TL;DR: we tell the OS the thread is blocked, and manually switch over to it by saving/manipulating registers.

[1] https://kristoff.it/blog/zig-colorblind-async-await/

[2] https://verdagon.dev/blog/seamless-fearless-structured-concu...

[3] https://blog.cloudflare.com/how-stacks-are-handled-in-go/

[4] https://youtu.be/NV46KFV1m-4

[5] Can't find the link, but was a discussion on their server.

duped 1470 days ago

I just want to say there are mountains of research on this, and recent development is exciting, but some of the techniques (like stack switching and moving) are very old. Project Loom is very intriguing because of how it solves the practical problems of introducing old concurrency techniques into existing language implementations that were not designed around them.

A lot of this stuff is intriguing from the implementation side, but where we're really lacking is in the syntax and semantic side to make concurrency "make sense" to programmers. I don't think we're close to solving that problem (for example, call/cc isn't the answer, it's the problem).

imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

A lot of the symptoms of the sync/async problem are caused by the explicit decoupling of sync/async APIs in source code. If you remove that and force it to be implicit internal to the language implementation, the issue goes away. It would take a lot of work to determine if that was worth it.

Basically as we've now accepted garbage collection to be an acceptable part of language implementation, one day I think we'll accept async executors to be a part of that too. We're halfway there on the impl side (Go, Java through Loom, NodeJS, etc). The other half is removing the explicit syntax for it.

hayley-patton 1469 days ago

> imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

Safepoints for garbage collection are somewhat similar, but for preemption one wants to interrupt threads on a timer, rather than before the collector takes over. Despite occurring very frequently (at around 100 _million_ checks per second), the time overhead is only about 2.5% or so, according to a study by Blackburn et al [0]. It appears, I think, that as long as the fast not-interrupting path is fast enough, eliminating safepoints isn't too important.

[0] Stop and Go: Understanding Yieldpoint Behaviour <https://users.cecs.anu.edu.au/~steveb/pubs/papers/yieldpoint...>

gjvnq 1470 days ago

> imho the issue isn't function coloring, threads, whatever. It's a compiler that defaults to async code in the calling convention and then optimization passes to de-async-ify (remove unnecessary yield points) the code at compile time. The result would be code that looks synchronous but is async where it matters (i/o).

Sounds like Erlang and single assignment languages.

Jokes aside, part of the problem seems to be the computer model and cpu architectures themselves.

We need something that is designed from scratch to run things concurrently.

duped 1469 days ago

Concurrency is mostly a higher level abstraction than the ISA, they don't care what the stack pointer is pointing to or what the return address is. Actually implementing concurrency efficiently is a solved problem, both in the trivial (stack less) and more complex (stackful) cases.

And that's sort of my point, concurrency primitives are really easy to define and implement but pretty hard to use by programmers up the stack.

useerup 1470 days ago

> Does anyone else get the feeling that we (as a field) are missing something basic about concurrency? Like there's a really elegant solution just around the corner, that has the low overhead of async/await without the complexity. Or otherwise put, the ease of goroutines but without GC.

Yes. There is current research into Algebraic Effects (see for instance https://www.microsoft.com/en-us/research/wp-content/uploads/...).

Algebraic Effects promise a return to non-colored functions, as AE can abstract over exceptions, continuations, async and other control-flow mechanisms.

kmeisthax 1469 days ago

A decade ago the simple thing we were missing about threaded concurrency was Rust's ownership and borrowing model and Send/Sync. Before that, the simple thing was to use early Java, which had a mandatory garbage collector and monitor objects. If you didn't have or use those, then you were subject to memory safety problems. And moving from heap-scanning GC to ownership and borrowing gave a genuine performance advantage.

Now, we want to remove threading from the concurrency story, in the hopes of getting another performance boost. This itself is the problem, because threads were giving us automatic preemption, akin to how GCs were giving us automatic memory safety. Now we have to statically determine a "good time" for the program to yield. I/O yielding is the easy part, and the reason why people are flocking to async; but we also need to support yielding for fairness reasons. Kernels can do this because they have interrupt timers; but there's no lower-overhead equivalent for userspace code that I'm aware of.

The other problems mentioned with async Rust are particular to Rust itself. The language has a policy that heap allocations only ever happen in `std`, because they want to support embedding Rust into applications where heaps don't exist. This means that futures need to be structs. Rust does support structs of indeterminate size, but barely; and there's no support for structs that can grow. Such a thing is likely unsound without a way for the compiler to check growth limits, and the memory is pinned, so we can't grow beyond a preset limit set at the start of the future[0].

Async infects everything it touches because it's a total pain to write networking library code that's preemption-agnostic. Monad<T> would fix that, but higher-kinded traits aren't a thing in Rust yet and we would need lots of language tooling (akin to `?`) to make this ergonomic to use.

There's also just the possibility that we've been engineering the wrong fix, and we should be trying to get OS threads to be as lightweight as possible rather than trying to move the entire threading system into userspace. There's no particular reason why we need 8MB stacks, other than the fact that compilers don't check stack growth themselves. (Which, BTW, is also a soundness hole in Rust as far as I know.)

[0] Go gets around this with a linked list of stacks, which adds its own overhead.

ghoward 1469 days ago

I'm betting on structured concurrency. I think it will be the same sort of revolution for concurrent programming that structured programming was for single-threaded programming.

Nullabillity 1470 days ago

You won't solve the broken and unusable programming model of threads by trying to emulate the programming model of threads.

xrobledo84 1470 days ago

What you are looking for is called Erlang

jjnoakes 1469 days ago

How is that "the ease of goroutines but without GC" ?

the__alchemist 1470 days ago

This sounds like what I'm looking for for building a set of networking/pentest tools. Ie, being able to spawn an arbitrary number of IO bound processes without the overhead of OS threads, and the contagion and fracturing of Async.

There may still be some fracturing here, ie in the first example (but not the others, inexplicably?) `lunatic::net` vice `std::net`.

bkolobara 1470 days ago

Hi, author here. All examples should have used `lunatic::net`, I fixed it now.

The reason why we provide `lunatic::net` and you can't just use `std::net` is that WASI (system interface for WebAssembly) still doesn't have support for sockets[0]. `lunatic::net::TcpStream` is for now just a drop in replacement for `std::net::TcpStream` and once sockets get standardised you will be able to use the standard library types instead.

[0]: https://github.com/WebAssembly/WASI/pull/312

zokier 1470 days ago

Has anyone seen any recent solid benchmarks of thread per connection architecture web application? What is actually the break-point load where it's perf starts to regress and async really becomes useful?

brickbrd 1470 days ago

What does "stream.write_all(&number_as_bytes).unwrap();" do if the socket buffer is full? Does it block this virtual thread running this function? Or does the stream keep buffering? or is it sending the message to some other process which is accumulating those messages. What if I don't wait this thread to block and instead do something else?

I believe all of these are handled. I just cannot find sufficient documentation to understand the details of how this works.

Matthias247 1469 days ago

Same as the synchronous version: It will block until more data can be written, and then go on and write as much as possible using another async .write() call. It's the same as:

    let mut offset = 0;
    while offset != number_as_bytes.length() {
        let written = stream.write(&number_as_bytes[offset..]).await.unwrap();
        offset += written;
    }

The synchronous version would be the same without the .await, and offers stronger guarantees that either all bytes are written to the socket or the socket errored and is dead. The async version could be cancelled in the middle of the invocation after some segments have already been written.

mamcx 1470 days ago

Exist an alternative to `actix` that can use this model?

Because it sound interesting, but the hard part is that you need a combo of request/webserver to have a chance.

and then the DB side....

beebmam 1469 days ago

I've been asking this for months, and I can't seem to find an answer anywhere:

I'm unable to get debugger breakpoints in Async functions in Rust to actually break.

Is this a known bug with Async Rust? Or is this simply unsupported (yet)? Seems like a really broken experience currently.

ithkuil 1470 days ago

is it possible to use this on a non-wasm target?

amelius 1470 days ago

Meanwhile, GoLang allows thousands of threads without problems.

avgcorrection 1469 days ago

Meanwhile, a language with completely different design goals does things in a very different manner.

Stop trying to stir shit.

jedisct1 1470 days ago

And servers such as fasthttp have excellent performance.

smilekzs 1470 days ago

FTFY: thousands of GREEN threads

amelius 1470 days ago

Does it matter? The point is that Go has excellent throughput and latency, while using only a single concurrency model.

jen20 1469 days ago

Yes, it does matter. It has excellent throughput and latency for certain classes of systems, while others are impossible to build. Rust may not impose this constraint while meeting its goals.

hu3 1469 days ago

Seems like Rust is better in every way right? I can't help but wonder why is it that Go is so much more popular when it comes to language of choice for networked and multi-threaded applications.

jen20 1469 days ago

It's not better in every way at all - it is more flexible. I regularly bounce between Go and Rust in different contexts.

stjohnswarts 1469 days ago

Because it's easy to learn and "good enough". Rust is not easy to learn.

amelius 1469 days ago

You can say the same about assembly language. Yet it is used only by a very narrow audience.

mcronce 1469 days ago

I would not describe latency as "excellent" in Go, unless you're measuring average or p90.

vips7L 1469 days ago

What happens if you need to do computational work on a go routine? Isn’t that going to block the carrier thread and then murder throughput?

IshKebab 1469 days ago

Yes, you have to manually insert yield points. Exactly the same as with every cooperative threading system, including Lunatic and Rust's native async/await.

Kind of feels like we need user space preemptive threading somehow.

smasher164 1469 days ago

Go has asynchronous preemption now using signals. Tight loops are preemptible.

vips7L 1469 days ago

Or just the ability to use native os thread pools for that type of work.

bruce343434 1470 days ago

> However, if you are doing web apps or any networking stuff, massive concurrency benefits are almost always too important to ignore

No, you will benefit from parallelism/multithreading. Why only use 1 core? Multitasking as it was once called, or "async" as it is now, is fundamentally _synchronous_ because everything still happens on one core. Just that the order of execution may be a bit wonky, which technically all code already suffers from at the microscopic level with instruction reordering and out of order execution. You almost certainly don't need multitasking unless you are writing an OS for embedded.

solar-ice 1470 days ago

Async in Rust usually runs across many cores, using a work-stealing scheduler, fwiw.

lalaithion 1470 days ago

Even if you use N cores, you still get a massive benefit from being able to let >N threads wait on IO events simultaneously using concurrency/multitasking/async.

bruce343434 1470 days ago

There's only so much a "apache" or "nginx" can do though in between io operations right? And there's only so much io per second a whole system can do. Basically, from disk to memory, maybe run a language interpreter if the site is not static, then from memory to the internet. Maybe if your pages are very dynamic and involve a lot of scripts it could be worthwhile. Do you have any numbers to back up your claim?

lalaithion 1469 days ago

I don’t have a reference offhand to hard numbers, but I’ve definitely run webservers which have a significantly higher number of concurrent in-flight requests than number of cores.

Even for a static site, what you’re basically doing is

    page = readFile(“foo.txt”)
    response.write(page)

That’s no CPU usage at all. ~Zero time spent in process. All the time is spent waiting on the data to be loaded into memory from disk, and then copied from memory out onto the network. If you use concurrency for those two functions, then you can handle ~100s of in-flight requests at the same time.

bkolobara 1470 days ago

Almost all async Rust runtimes use a multithreaded work stealing schedulers by default, to equally utilise all available cores.

maleldil 1470 days ago

Tokio has a multi-threaded scheduler.