|
|
|
|
|
by HippoBaro
1018 days ago
|
|
As someone else said, it is not, strictly speaking, a bug. If your server receives a request that requires very computationally expensive work, is it okay to delay every other request on that core? That's probably not okay, and it'll show in your latency distribution. Folks would rather have every future time sliced so that other tasks get some CPU time in a ~fair way (after all, there is no concept of task priority in most runtime). But you're right: it isn't required, and you could sprinkle every loop of your code with yielding statements. But knowing when to yield is impossible for a future. If nothing else is running, it shouldn't yield. If many things are running but the problem space of the future is small, it probably shouldn't yield either, etc. You simply do not have the necessary information in your future to make an informed decision. You need some global entity to keep track of everything and either yield for you or tell you when you should yield. Tokio does the former, Glommio does the latter. It gets even more complex when you add IO into the mix because you need to submit IO requests in a way that saturates the network/nvme drives/whatever. So if a future submits an IO request, it's probably advantageous to yield immediately afterward so that other futures may do so as well. That's how you maximize throughput. But as I said, that's a very hard problem to solve. |
|
I guess if someone wants to use futures as if they were goroutines then it's not a bug, but this sort of presupposes that an opinionated runtime is already shooting signals at itself. Fundamentally the language gives you a primitive for switching execution between one context and another, and the premise of the program is probably that execution will switch back pretty quickly from work related to any single task.
I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling. The described problem cannot even happen in the "runtime" I'm currently using because io_uring won't just completely stop responding to other kinds of sqe's and only give you responses to a multishot accept when a lot of connections are coming in. I strongly suspect equivalent results are achievable with epoll.