Hacker News new | ask | show | jobs
by ajross 593 days ago
> If that task has cancelled its interest in this syscall, they should instead clean up the resources owned by that CQE.

So, first: how is that not consistent with the contention that the bug is due to a collision in the meaning of "asynchronous"? You're describing, once more, a synchronous operation ("when ... cancel") on a data structure that doesn't support that ("the kernel writes ..." on its own schedule).

And second: the English language text of your solution has race conditions. How do you prevent reading from the buffer after the beginning of "cancel" and before the "dispatch"? You need some locking in there, which you don't in general async code. Ergo it's a paradigm clash. Developers, you among them it seems, don't really understand the requirements of a truly async process and get confused trying to shoehorn it into a "callbacks with context switch" framework like rust async.

1 comments

> Developers, you among them it seems, don't really understand the requirements of a truly async process and get confused trying to shoehorn it into a "callbacks with context switch" framework like rust async.

This is an odd thing to say about someone who has written a correct solution to the problem which triggered this discussion.

Also, you really need to define what truly async means. Many layers of computing are async or not async depending on how you look at them.

Saw this show up after the fact. Maybe it's safe enough for me to try to re-engage: The point I was trying to make, to deafening jeering, is that the linked bug is a really very routine race conditions that is "obvious" to people like me coming from a systems programming background who deal with parallelism concerns all the time. It looks interesting and weird in the context of an async API precisely because async APIs work to hide this kind of detail (in this case, the fact that the events being added to the queue are in a parallel context and racing with the seemingly-atomic "cancel" operation).

APIs to deal with things like io-uring (or DMA device drivers, or shared memory media streams, etc...) tend necessarily to involve explicit locking all the way up at the top of the API to make the relationship explicit. Async can't do that, because there's nowhere to put the lock (it only understands "events"), and so you need to synthesize it (maybe by blocking the cancelling thread until the queue drains), which is complicated and error prone.

This isn't unsolvable. But it absolutely is a paradigm collision, and something I think people would be better served to treat seriously instead of calling others names on the internet.

Hi, I’m also from a systems programming background.

I’m not sure what your level of experience with Rust’s async model is, but an important thing to note is that work is split between an executor and the Future itself. Executors are not “special” in any way. In fact, the Rust standard library doesn’t even provide an executor.

Futures in Rust rely on their executors to do anything nontrivial. That includes the actual interaction with the io-uring api in this case.

A properly implemented executor really should handle cases where a Future decides to cancel its interest in an event.

Executors are themselves not implemented with async code [0]. So I’m not quite able to understand your claim of a paradigm mismatch.

[0]: subexecutors like FuturesUnordered notwithstanding.

https://news.ycombinator.com/item?id=41996976

See this comment describing how an executor can properly handle cancelations.