| Patch author here. It is important to not conflate POSIX requirements with expected behavior, especially for device files which require very specific knowledge of their implementation to use (DRM ioctl's and resources anyone?). You might think that as a well-behaved game should not be opening/closing evdev fds during gameplay at all, this is clearly just an application bug. However, games are not the main user of evdev devices, your display server is! This bug causes input device closure during session switching (e.g. VT switching) to take abnormally long - on the machine I discovered the bug on, it ends up adding over a second to the session switch time, significantly impacting responsiveness. This is absolutely a kernel bug. I did not push the patch further as I had other priorities, and testing this kind of patch is quite time-consuming when it only reproduces in a measurable way on single physical machine. Other machines end up with a much shorter synchronize_rcu wait and often have many fewer input devices, explaining why the issue was not discovered/fixed earlier. call_rcu is intended to be used wherever you do not want the writer to block, while alternative fixes involve synchronize_rcu_expedited (very fast but expensive), identifying if the long synchronize_rcu wait is itself a bug that could be fixed (might be correct), or possibly refactoring evdev (which is quite a simple device file). As for putting things in threads, I would consider it a huge hack to move open/close. Threads are not and will never be mandatory to have great responsiveness. |
The POSIX interface was invented for batch processing. Long running non-interactive jobs. This is why it lacks timing requirements. All well-designed interactive GUI applications do not interact with the file system on their main thread. This is especially true for game display loops. The fundamental problem here is that they are doing unbounded work on a thread that has specific timing requirements (usually 16.6ms per loop). As I’ve said elsewhere, this bug will still manifest itself no matter how fast you make close(), just depends on how many device files are present on that particular system. It’s a poor design. Well designed games account for every line of code run in their drawing loop.
> This is absolutely a kernel bug.
I don’t think that is proven unless the original author can chime in. It’s your best guess and opinion that the author intended to not block on synchronize_rcu but it’s perfectly possible they did indeed intend the code as written. synchronize_rcu is used in plenty of other critical system call paths in similar ways, not every one of those uses is a bug. I would guess you might be slightly suffering from tunnel vision a bit here given how the behavior was discovered.
If it is indeed the case the synchronize_rcu is taking up to 50ms I would suspect there is a deeper issue at play on this machine. By search/replacing the call with call_rcu or similar you may just be masking the problem. RCU updates should not be taking that long.