| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Matthias247 1478 days ago

> The problem is that threads just don’t work in practice for massive concurrency.

That's an assumption that is repeated very often recently, and measured very rarely. Truth is that they amount of applications for which they don't work is surprisingly low. I'm working at a well known cloud provider, and lots of people would really be suprised which applications at largest scale are working fine with a thread-per-request model. 50k OS threads are not really an issue on modern server hardware. While it might not be the most efficient [1], it will not perform so bad that it causes an availaiblity impact either.

There's obviously some exceptions to that [2] - but I encourage people to measure instead of making assumptions. Unless one finds themselves in a weekly meeting about server efficiency or scaling cliffs both models probably work.

[1] it really depends on the workload, but people might find an efficiency degradation (e.g. measured as BYTES_TRANSFERRED/CPU_CORES_USED) of 20% at a concurrency level of 1000, or maybe only at a concurrency level of 10k. Coarse-grained work items (e.g. send a large file to a socket) will show a lower degradation.

[2] Load balancers, CDN services, and e.g. chat applications which maintain a massive amount of mostly idle client connections can be such environments. They have a high amount of concurrency that needs to be managed, but less so of "active concurrency". If all clients would be active at the same time, those environments would run out of disk IO or network bandwidth far before CPU or memory become an issue.

5 comments

gopalv 1478 days ago

> While it might not be the most efficient, it will not perform so bad that it causes an availaiblity impact either.

Performance is important, but the biggest performance gain happens when a program goes from not working to working correctly.

Debugging is another corner case which async makes it intolerably hard to get backtrace and make sense out of what is going on.

It's not like debugging threads is easy, but in a low contention environment which is entirely "1 thread holds state of one request" and there are few interlocking threads in it, threading is a fair bit better than async execution. Plus the logs which indicate thread-names make it possible to draw out something like a post-processed Catapult timing diagram (open chrome://tracing and look at an example, it is a great UI for dropping in your own multi-threaded event log as JSON).

I'm a big fan of executor thread-groups and work queues, but damn does it make hard to mentally walk through a bug when the stack traces are scattered across multiple places.

bsder 1477 days ago

> > The problem is that threads just don’t work in practice for massive concurrency.

> That's an assumption that is repeated very often recently, and measured very rarely.

I would go further--there is a whole infrastructure that needs to appear when massive concurrency is involved and very few times is that taken into account.

For those people interested in genuine massive concurrency, I encourage people to investigate Erlang. In my opinion, the language itself is just "meh", but OTP, the infrastructure around managing, upgrading, restarting, etc. processes/threads, is extremely on point.

Side note: Erlang still has the absolute best handling of binary parsing of any language ever. https://www.erlang.org/doc/programming_examples/bit_syntax.h...

I really wish the Rust people would pick something like the Erlang Bit Sytax up and integrate it with their pattern matching (probably necessitating some pattern matching language fixes) rather than the amount of effort they continue to piddle on async/await.

rad_gruchalski 1477 days ago

Erlang pattern matching is awesome. Matching on binaries makes it very easy to parse protocols.

Re concurrency. I learned Erlang before Akka. It took me a bit but I find Akka more ergonomic. Akka will easily handle millions of actors on a single machine, too. But I always miss matching on binaries.

Another good one is protoactor for golang. That will also do a million actors no problem. Comes really close to Erlang in terms of how concise the syntax is. But again, no binary matching.

woah 1477 days ago

Why would you assume that all software is written for servers in datacenters? Rust tends to be used in embedded devices, WASM, and other weird contexts where there might not be as many resources available.

If you're writing a CRUD app, sure, do it in PHP and spin up a thread per request.

rat9988 1477 days ago

Because he is talking about massive concurrency, not embedded or wasm or other contexts where there not be as many resources available.

rad_gruchalski 1477 days ago

Since when is 50k threads massive?

int_19h 1477 days ago

Embedded is much less likely to need async in the first place at all.

woah 1477 days ago

Having written wifi router firmware in rust, I would disagree

hgomersall 1477 days ago

I use async extensively in my embedded application. The design is a joy to extend and maintain.

baq 1477 days ago

cooperative multitasking sounds way more embedded than preemptive...

Matthias247 1477 days ago

Not necessarily. A lot of embedded projects use realtime operating systems (RTOS). And those make use of preemptive schedulers in order to actually provide realtime guarantees.

There's obviously also some projects which just use a bare-metal loop to do everything - that probably counts as cooperative.

eklitzke 1477 days ago

I agree and this article seems pretty misinformed. Creating and managing threads on Linux is extremely cheap, especially when a lot of them are idle, and a lot of big companies (Google, Facebook, Amazon) have tons of huge C++ applications that have thousands of threads and it's fine. I also think a lot of people who don't work on these problems at these kinds of companies assume that it must be incredibly difficult to write code like this and debug it, but that's not really true. For one thing, generally the tricky parts to write are abstracted away so that regular engineers don't have to think much about threading concurrency issues. And when they come up, tsan and lock annotations[1] will catch 99.9% of these problems in testing and make it easy to understand why things are breaking.

In the real world here are the kinds of problems that people at Google etc. care about when it comes to performance or scalability issues with hugely concurrent programs:

  - Noisy neighbor problems from other threads messing with your TLB and L1 cache
  - High cost of context switches
  - Unpredictable scheduling/priority inversion in the scheduler

The first problem isn't actually made any better by using async coroutines or green threads/fibers, if you switch to another coroutine or fiber and it does something naughty (e.g. munmaps memory, which will cause a TLB shootdown) it's going to degrade performance for your unrelated coroutine/fiber.

The second and third problems can be solved in some cases by things like fibers and userspace scheduling, but this is a fairly advanced topic and "just use async" is definitely not the solution. If you're interested in learning more about how these problems are actually solved at Google for example I recommend [2] and [3].

[1] https://abseil.io/docs/cpp/guides/synchronization#thread-ann... [2] https://www.youtube.com/watch?v=KXuZi9aeGTw [3] https://storage.googleapis.com/pub-tools-public-publication-...

ibraheemdev 1477 days ago

> - Noisy neighbor problems from other threads messing with your TLB and L1 cache

Switching between threads within the same process doesn't require a TLB or L1 cache flush. Not sure if you were implying this, just wanted to point that out.

> - High cost of context switches

Userspace schedulers (like rust's tokio) do make context switching cheaper, however, most of the context switching in the case of a web server is due to blocking I/O and the most expensive part of the switch, entering the kernel, is already accounted for by the I/O request. Kernel context switching is unlikely to be your bottleneck.

> Unpredictable scheduling/priority inversion in the scheduler

This can definitely be an issue at scale, but a general purpose async scheduler like most use is unlikely to be any better.

rstuart4133 1477 days ago

As another data point, I have one Firefox window right now:

    $ ps -eLf | grep firefox | wc -l
    569
    $

geodel 1478 days ago

Once one go with cultish following of async everything idea, measuring things would be heresy.