| > The problem is that threads just don’t work in practice for massive concurrency. That's an assumption that is repeated very often recently, and measured very rarely. Truth is that they amount of applications for which they don't work is surprisingly low. I'm working at a well known cloud provider, and lots of people would really be suprised which applications at largest scale are working fine with a thread-per-request model. 50k OS threads are not really an issue on modern server hardware. While it might not be the most efficient [1], it will not perform so bad that it causes an availaiblity impact either. There's obviously some exceptions to that [2] - but I encourage people to measure instead of making assumptions. Unless one finds themselves in a weekly meeting about server efficiency or scaling cliffs both models probably work. [1] it really depends on the workload, but people might find an efficiency degradation (e.g. measured as BYTES_TRANSFERRED/CPU_CORES_USED) of 20% at a concurrency level of 1000, or maybe only at a concurrency level of 10k. Coarse-grained work items (e.g. send a large file to a socket) will show a lower degradation. [2] Load balancers, CDN services, and e.g. chat applications which maintain a massive amount of mostly idle client connections can be such environments. They have a high amount of concurrency that needs to be managed, but less so of "active concurrency". If all clients would be active at the same time, those environments would run out of disk IO or network bandwidth far before CPU or memory become an issue. |
Performance is important, but the biggest performance gain happens when a program goes from not working to working correctly.
Debugging is another corner case which async makes it intolerably hard to get backtrace and make sense out of what is going on.
It's not like debugging threads is easy, but in a low contention environment which is entirely "1 thread holds state of one request" and there are few interlocking threads in it, threading is a fair bit better than async execution. Plus the logs which indicate thread-names make it possible to draw out something like a post-processed Catapult timing diagram (open chrome://tracing and look at an example, it is a great UI for dropping in your own multi-threaded event log as JSON).
I'm a big fan of executor thread-groups and work queues, but damn does it make hard to mentally walk through a bug when the stack traces are scattered across multiple places.