Hacker News new | ask | show | jobs
by HippoBaro 1019 days ago
They often implement soft preemption. Tokio and others like Glommio do. Usually, it's based on interrupts. The runtime schedules a timer to fire an interrupt, and some code is injected into the interrupt handler.

This is used to keep track of task runtime quotas so they can yield as soon as possible afterward.

This is the same technique used in Go and many others for preemption. If you don't add this, futures that don't yield can run forever, stalling the system.

You are right that it is not strictly necessary, but in practice, it is so helpful as a guard against the yielding problem that it's ubiquitous.

> I certainly hope that we didn't end up with colored functions in Rust because of such a misconception.

Misconceptions are everywhere unfortunately!

3 comments

Tokio and glommio using interrupts is ironically another misconception. They're cooperatively scheduled so yes, a misbehaving blocking task can stall the scheduler. They can't really interrupt an arbitrary stackless coroutine like a Future due to having nowhere to store the OS thread context in a way that can be resumed (Each thread has its own stack, but now it's stackful with all the concerns of sizing and growing. Or you copy the stack to the task but now have somehow to fixup stack pointers in places the runtime is unaware).

https://tokio.rs/blog/2020-04-preemption#a-note-on-blocking

> Tokio does not, and will not attempt to detect blocking tasks and automatically compensate

> This is the same technique used in Go and many others for preemption. If you don't add this, futures that don't yield can run forever, stalling the system.

You may be referring to this particular issue in Go https://github.com/golang/go/issues/10958 which I think was somewhat addresses a couple releases back.

> You are right that it is not strictly necessary, but in practice, it is so helpful as a guard against the yielding problem that it's ubiquitous.

This is honestly shocking to hear. I would think that if people had bugs in their programs they would want them to fail loudly so they can be fixed.

As someone else said, it is not, strictly speaking, a bug. If your server receives a request that requires very computationally expensive work, is it okay to delay every other request on that core? That's probably not okay, and it'll show in your latency distribution.

Folks would rather have every future time sliced so that other tasks get some CPU time in a ~fair way (after all, there is no concept of task priority in most runtime).

But you're right: it isn't required, and you could sprinkle every loop of your code with yielding statements. But knowing when to yield is impossible for a future. If nothing else is running, it shouldn't yield. If many things are running but the problem space of the future is small, it probably shouldn't yield either, etc.

You simply do not have the necessary information in your future to make an informed decision. You need some global entity to keep track of everything and either yield for you or tell you when you should yield. Tokio does the former, Glommio does the latter.

It gets even more complex when you add IO into the mix because you need to submit IO requests in a way that saturates the network/nvme drives/whatever. So if a future submits an IO request, it's probably advantageous to yield immediately afterward so that other futures may do so as well. That's how you maximize throughput. But as I said, that's a very hard problem to solve.

Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

I guess if someone wants to use futures as if they were goroutines then it's not a bug, but this sort of presupposes that an opinionated runtime is already shooting signals at itself. Fundamentally the language gives you a primitive for switching execution between one context and another, and the premise of the program is probably that execution will switch back pretty quickly from work related to any single task.

I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling. The described problem cannot even happen in the "runtime" I'm currently using because io_uring won't just completely stop responding to other kinds of sqe's and only give you responses to a multishot accept when a lot of connections are coming in. I strongly suspect equivalent results are achievable with epoll.

>Trying to solve the problem by frequently invoking signal handlers will also show in your latency distribution!

So just like any other kind of scheduling? "Frequently" is also very subjective, and there are tradeoffs between throughput, latency, and especially tail latency. You can improve throughput and minimum latency by never preempting tasks, but it's bad for average, median, and tail latency when longer tasks starve others, otherwise SCHED_FIFO would be the default for Linux.

>I read the blog about this situation at https://tokio.rs/blog/2020-04-preemption which is equally baffling

You've misunderstood the problem somehow. There is definitely nothing about tokio (which uses epoll on Linux and can use io_uring) not responding in there. io_uring and epoll have nothing to do with it and can't avoid the problem: the problem is with code that can make progress and doesn't need to poll for anything. The problem isn't unique to Rust either, and it's going to exist in any cooperative multitasking system: if you rely on tasks to yield by themselves, some won't.

> So just like any other kind of scheduling?

Yes. Industries that care about latency take some pains to avoid this as well, of course.

> io_uring and epoll have nothing to do with it and can't avoid the problem: the problem is with code that can make progress and doesn't need to poll for anything.

They totally can though? If I write the exact same code that is called out as problematic in the post, my non-preemptive runtime will run a variety of tasks while non-preemptive tokio is claimed to run only one. This is because my `accept` method would either submit an "accept sqe" to io_uring and swap to the runtime or do nothing and swap to the runtime (in the case of a multishot accept). Then the runtime would continue processing all cqes in order received, not *only* the `accept` cqes. The tokio `accept` method and event loop could also avoid starving other tasks if the `accept` method was guaranteed to poll at least some portion of the time and all ready handlers from one poll were guaranteed to be called before polling again.

This sort of design solves the problem for any case of "My task that is performing I/O through my runtime is starving my other tasks." The remaining tasks that can starve other tasks are those that perform I/O by bypassing the runtime and those that spend a long time performing computations with no I/O. The former thing sounds like self-sabotage by the user, but unfortunately the latter thing probably requires the user to spend some effort on designing their program.

> The problem isn't unique to Rust either, and it's going to exist in any cooperative multitasking system: if you rely on tasks to yield by themselves, some won't.

If we leave the obvious defects in our software, we will continue running software with obvious defects in it, yes.

>This sort of design solves the problem for any case of "My task that is performing I/O through my runtime is starving my other tasks."

Yeah, there's your misunderstanding, you've got it backwards. The problem being described occurs when I/O isn't happening because it isn't needed, there isn't a problem when I/O does need to happen.

Think of buffered reading of a file, maybe a small one that fully fits into the buffer, and reading it one byte at a time. Reading the first byte will block and go through epoll/io_uring/kqueue to fill the buffer and other tasks can run, but subsequent calls won't and they can return immediately without ever needing to touch the poller. Or maybe it's waiting on a channel in a loop, but the producer of that channel pushed more content onto it before the consumer was done so no blocking is needed.

You can solve this by never writing tasks that can take "a lot" of time, or "continue", whatever that means, but that's pretty inefficient in its own right. If my theoretical file reading task is explicitly yielding to the runtime on every byte by calling yield(), it is going to be very slow. You're not going to go through io_uring for every single byte of a file individually when running "while next_byte = async_read_next_byte(file) {}" code in any language if you have heap memory available to buffer it.

There's nothing buggy about a future that never yields because it can always make progress, but people prefer that a runtime doesn't let all other execution get starved by one operation. That makes it a problem that runtimes and schedulers work to solve, but not a bug that needs to be prevented at a language level. A runtime that doesn't solve it isn't buggy, but probably isn't friendly to use, like how Go used to have problems with tight loops and they put in changes to make them cause less starvation.