Hacker News new | ask | show | jobs
by nominatronic 2292 days ago
This is basically the idea of io_uring [0], although that mechanism is specific to certain I/O operations rather than being a generic queuing system for syscalls.

[0] https://lwn.net/Articles/776703/

2 comments

Before io_uring there were many failed proposals for a general purpose, asynchronous interface to wrap blocking syscalls. One of these, "syslets", was literally a mechanism for batching arbitrary syscalls: https://lwn.net/Articles/221887/

> Syslets are small, simple, lightweight programs (consisting of system-calls, 'atoms') that the kernel can execute autonomously (and, not the least, asynchronously), without having to exit back into user-space. ....

> Syslets consist of 'syslet atoms', where each atom represents a single system-call. These atoms can be chained to each other: serially, in branches or in loops.

The user api: https://people.redhat.com/mingo/syslet-patches/patches/async...

User space memory mapping (naturally left to the user) seems to be the error prone bit here.

search for "sys_umem" here: https://people.redhat.com/mingo/syslet-patches/patches/async...

I understand that this is low level stuff for experienced developers, but this approach basically asks the programmer to play linker. Entirely doable but I wonder what it looks like at scale (LOC).

Thanks for sharing. It's encouraging that something at least resembling what I came up with was considered.
Any thoughts on why they failed?
I don't think the kernel complexity and maintenance burden was deemed worth the effort. But now 1) Spectre has made syscalls more expensive and 2) Linux has destroyed all the competition and so there's alot more pressure to add features for what effectively would only ever be used by niche use cases--you don't add io_uring to benefit the Node.js crowd, you add it for Amazon and Netflix and other consumers that have have no other avenues for optimization.

To be fair, io_uring seems to be a pretty sane design. But, for example, for many years anything with a shared ring buffer between userspace and the kernel was suspect. Now there are several such interfaces. Overall tolerance has grown, especially on the part of Linus. The kernel is much larger, and Linus seems to have become much more deferential to subsystem maintainers as his ability to understand and opine on code has slipped beyond his grasp.

Long-term, io_uring will almost certainly going to provide more operations than just I/O.
Batched virtual memory updates (mmap/munmap/madvise) could be a boon for allocators and garbage collectors since the cost of TLB shootdowns could be amortized.

Similarly if all the syscalls done between fork and exec could be squeezed into a single uring chain without any intervening userspace code then the kernel could skip some costly virtual memory operations, bringing us closer to posix_spawn behavior.

> Batched virtual memory updates (mmap/munmap/madvise) could be a boon for allocators and garbage collectors since the cost of TLB shootdowns could be amortized.

And ideally also the cost of mmap_sem (the lock for an address space's memory map), which can be one of the most contended locks in a workload.

> Similarly if all the syscalls done between fork and exec could be squeezed into a single uring chain without any intervening userspace code then the kernel could skip some costly virtual memory operations, bringing us closer to posix_spawn behavior.

I'd love to see a way of doing this with io_uring.