Hacker News new | ask | show | jobs
by thirdreplicator 3493 days ago
Thoroughly enjoyed the tutorial, but why would one want to make a custom system call? What superpowers does this give you? Thanks in advance for your answers.
1 comments

It's your best interface with the kernel. It's simple and high-performance. It's specifically what you want if you want to pass structured data in-memory to the kernel.

In a strict technical sense, there's nothing you need a syscall for, you can just read/write data (or maybe do an ioctl) on a new device node or something. In fact, OpenAFS supports routing its "syscall" on Linux through ioctls on /proc/fs/openafs/syscall, because Linux makes it deliberately annoying to patch the syscall table from a kernel module so as to make life harder for rootkits.

However, it's simpler to pass data structures if you can use a syscall. It's much higher-performance than opening a file node. And if you expect to run in an environment where you don't know if a particular file will exist (e.g., a chroot), it's useful to use a syscall directly, because that's always available. For instance, getrandom was added in July 2014 partly for this reason, and partly so that if you ran out of file descriptors to open /dev/urandom you could still get randomness.

Here are all the syscalls added in the last two years:

* pkey_mprotect, pkey_alloc, pkey_free: support for a new Intel processor feature, Memory Protection Keys https://lwn.net/Articles/643797/

* preadv2, pwritev2: add a flags argument so you can do a non-blocking preadv or pwritev without opening the file in non-blocking mode https://lwn.net/Articles/670231/

* copy_file_range: copy data between two file descriptors, using filesystem support for efficient copies if possible https://lwn.net/Articles/659523/

* mlock2: add a flags argument so you can mlock memory when it's next accessed https://lwn.net/Articles/650538/

* membarrier: force a memory barrier on all running threads to help with userspace RCU, garbage collection, etc. http://man7.org/linux/man-pages/man2/membarrier.2.html

* userfaultfd: implement userspace paging https://www.kernel.org/doc/Documentation/vm/userfaultfd.txt

* execveat: a version of execve that takes a file descriptor (or a fd and relative path) instead of a string to execute http://man7.org/linux/man-pages/man2/execveat.2.html