Hacker News new | ask | show | jobs
Linux System Call Table (thevivekpandey.github.io)
220 points by thevivekpandey 3191 days ago
19 comments

So the issues noticed so far:

* Missing syscalls

* Wrong syscall numbers

* Wrong calling convention

* Links to source are to wrong version

Does the table get actually anything right? I mean this is pretty spectacular cascade of failures.

At least for x86, you can get this same information fairly easily directly from the source. The table is located at arch/x86/entry/syscalls/syscall_64.tbl, from there you can grep for the function with git grep. For example, git grep 'SYSCALL_DEFINE.*read'.
why bother with git grep vs. just vanilla grep. i could see the use if you're working with an older binary, but you didn't mention.
If you ran something like grep -r SYSCALL_DEFINE.read from the top level of the linux source it would search through not just your source code, but also all of the artifacts of building the kernel. Basically, git grep is faster in this case because it filters the searched files down to only ones that are checked in. You could achieve a similar effect with standard tools like this: find -type f -regex '.\.[hc]' | xargs grep 'SYSCALL_DEFINE.*read'

    grep -r --include='*.[hc]' 'SYSCALL_DEFINE.*read'
Nice. I hadn't used --include before.
See also: programmer's grep clones. https://beyondgrep.com/more-tools/#Other%20grep-like%20tools

They work fine even where git-grep is not an option. Example:

    ack 'SYSCALL_DEFINE.*read'
Actually, the syscall numbers are wrong! This reference seems better: http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x...

Consider simple C program:

    #define _GNU_SOURCE
    #include <unistd.h>
    
    int main(){
        syscall(276);
    }
Strace'ing it shows the syscall used is tee, just as the reference I linked shows, and not pwritev as in OP's table.
It's odd. The owner of this Git repo has Issues turned off so I can't post a question/issue, and it appears to have been auto-generated - https://github.com/thevivekpandey/syscalls-table-64bit is a "fork" of https://github.com/paolostivanin/syscalls-table-64bit
I think you're right - the submitted table has no entries for "fork" and "clone".
strace is not a proof. It has it's own built-in table. So also strace could be wrong.

In practice strace is widely used and bugs should be discovered, reported, and fixed soon. So without doing any own analysis I'd bet that in doubt this table is wrong and strace right.

Note that syscall list and numbers are architecture specific. The differences are typically not huge, but they exist.

If man pages were up to date, this should be the index of chapter 2. I have discover unix with sun in the 90s and I am very nostalgic of the quality of man pages. At that time, man pages were complete and up to date. My latest frustration was with the option -m of df command. Chapter 2 should be updated each time a new version of kernel is installed.
It's very strange that adding/updating documentation isn't treated as a basic requirement for a patch that adds to or modifies Linux's public interfaces.
For the BSDs, incorrect or missing man pages are considered a serious bug.
Man pages aren't even in the kernel tree.
From Documentation/process/submit-checklist.rst:

  19) All new userspace interfaces are documented in ``Documentation/ABI/``.
      See ``Documentation/ABI/README`` for more information.
      Patches that change userspace interfaces should be CCed to
      linux-api@vger.kernel.org.
Michael Kerris keeps and up to date reference [1]. Even details _all_ system calls [2].

[1] https://www.kernel.org/doc/man-pages/

[2] http://man7.org/linux/man-pages/dir_section_2.html

If you want that level of quality, don't use Linux, use instead FreeBSD or OpenBSD.
Nice, I'm curious how it maps the system call to the source code line number dynamically? (Edit: seems like ctags + http://elixir.free-electrons.com/linux/latest/source [1]) It supports every kernel version.

The linked source code browser seems like a useful way to check the history of system calls for research...

[1] https://github.com/thevivekpandey/syscalls-table-64bit/blob/...

That's all cool and everything, but the registers are wrong... Not only are they 32-bit (eax vs. rax), but their order is wrong too - the first argument in x86-64 ABI is rdi, for example.
The registers look correct for the i386 ABI. eax for the system call number, then ebx, ecx, edx, esi, edi, ebp for the next 6 arguments.

I skimmed a couple files in the code. And it seems like it might be parsing this information out of some other sources, and maybe getting confused about the info it's grabbing?

https://github.com/thevivekpandey/syscalls-table-64bit

There are tables in the kernel git repo if you want a good reference for their values; however, the register definitions aren't provided.

x86: https://github.com/torvalds/linux/blob/master/arch/x86/entry...

x64: https://github.com/torvalds/linux/blob/master/arch/x86/entry...

This man page describes the syscall ABI for all architectures. http://man7.org/linux/man-pages/man2/syscall.2.html
...and the syscalls(2) man page lists them: http://man7.org/linux/man-pages/man2/syscalls.2.html
I put out a syscall table back in the day for Linux 2.2 (up to %eax 190). Someone copied it (I'm glad.): https://www.cs.utexas.edu/~bismith/test/syscalls/syscalls32.... They didn't attribute it to me, but I remember a professor did for his class. There were better tables after that I admit, though I liked my version because it linked into the source code.
I'm a bit puzzled. The code is at "syscalls-table-64bit", yet the regs are eax, etc. This makes very little sense.

In any event, I think the args should just be labeled arg0..arg5.

In several cases, the order of the args for a syscall varies between architectures--writing a general "arg0" doesn't make a lot of sense.

That said, I don't know what's up with it using 32-bit register names.

This is very handy with the asm registers mapped to arguments. Thanks!
Nice work. the table has been generated for 4.10 and hence the link to the source code files should also have this kernel version in the path of the url for direct access
Ah, this is neat! Would be nice to have a script for this that you could just point at a local copy of the source tree too!
What is the use case for this? Is it for someone trying to write their own syscall wrappers?
You might need that when you want to reimplement Linux, the Joyent team did that on their OS (derived from solaris) so that user can run linux binaries on a solaris kernel (so thay have dtrace, zfs, mdb, ...) Bryan Cantrill did a bunch of conferences on that (one here: https://youtu.be/TrfD3pC0VSs)

The idea behind is that Linux is only a list of syscalls, if you are able to reimplement them, you reimplement linux, you don't need anything else. On the contrary if you want to reimplement a BSD you need to reimplement their libc (and perhaps some other libraries)

> On the contrary if you want to reimplement a BSD you need to reimplement their libc (and perhaps some other libraries)

To clarify, what you're saying is that in BSD land, the syscall API is not considered stable, but libc is?

I'm not ultra familiar with the topic so if someone wants to correct me please do but :

- Linux has always been described as just a kernel, which translates as just a syscall table. The fact that this table is stable or not is not relevant here.

- *BSD on the other hand are shipping a kernel plus a lot of libraries/binaries, if you want to simulate a BSD system, you have to expose those libraries/binaries.

It's not so much a technical difference, it's more of a different approach to OS development (kernel space vs kernel/user space).

Thing is, if syscalls in BSD are considered stable the way they are in Linux, then you could just ship your own kernel with BSD's libc. But if they consider it an internal API between kernel and libc, and apps are only ever supposed to depend on libc, then of course that doesn't work.

So stability of syscall API is the de facto differentiating factor here. It sounds like Microsoft couldn't do "Windows Subsystem for BSD" the way it did WSL, for example.

For that matter, this is how Widows's Linux compatibility layer works.
FreeBSD had a linux syscall layer before either of those, I believe.
Windows NT had a POSIX personality in 1993.
POSIX doesn't define the syscalls. It defines requirements for system libraries.
I had cause to consult a syscall table (not this one, a correct one, forget where I found it) when doing something or other with fasm. fasm's macros are pretty dang advanced...I remember having argument length checking and rudimentary type checking as well. Then I got done yak shaving and remembered that programming in asm sucks.

But yeah, it's for writing your own syscall wrappers. Something not exported by libc, or more likely if you're not using libc.

Sometimes, yes, you'll need to write your own syscall wrappers. For example, there isn't a gettid (get thread ID) function in Glibc, but you can work around this by calling the syscall directly.

The other case where this is useful is if you're wanting to write userspace assembly without calling a C library. This may be especially useful when you're writing a compiler, or if you're trying to write small shellcodes for some reason.

Or to try making more sense of some libc implementation. The syscall stuff in glibc and musl both have a good bit of preprocessor voodoo to make syscalls feel more like function calls.
Implementing a programming language without any dependencies to C code for example.
Do system calls put their return value on the calling threads stack or in a register?
AFAIK, A process isn't required to have a stack.
Someone should put together a list of which ones are irredeemably broken (and as such, humanity is stuck with a broken ABI in perpetuity), e.g. epoll.
What about non-x86/64 platforms?
The mere fact that we are debating over the correctness of this table confirms the quality of the documentation of the OS we base our entire civilization upon is pretty poor.
This is great! It would be even more useful to have this for Mac OSX too. A lot of the projects I do ends up being on both Mac and Linux. It's always a pain to find the corresponding number for the system call on Mac.
System calls have not stability guarantee on macOS. You should use libc instead. In general the use of syscalls directly is fairly limited.

Edit: for instance go broke once for macOS Sierra, when Apple changed the gettimeofday system call: https://github.com/golang/go/issues/16570

Looking at this bug, it seems that Go has "fixed" it by fixing the syscall arguments, not by switching to libc.

Did they switch to libc since then?

If not, and given that Go apps are normally statically linked, does this mean that any precompiled Go app basically has a time bomb, in a sense that it'll break next time Apple changes some syscall?

The problem go has is that there’s a rather large overhead for calling C functions [1]. So they did not switch to calling libc as far as I know. And yes the next time Apple changes the syscalls, it will break again.

[1]: https://groups.google.com/forum/m/#!topic/golang-nuts/RTtMsg...

Wow. So, basically, Go is a rather insular ecosystem - since you're paying the overhead of a context switch for every single FFI call - and if you use the stock APIs, it's essentially broken by design on macOS (since it uses APIs that Apple itself does not consider stable).

That's really sad. I was just beginning to like some aspects of it.

Linux doesn't guarantee syscall stability either. Just make sure your wrappers can use a syscall table chosen at runtime, depending on which kernel you are running.
Yes it does, at least in the sense that syscalls which become officially public will never be removed from Linus' tree except in rare circumstances (i.e. proof that nobody is using it), nor will the arguments change. This is Linus' famous "never break user space" ABI mantra. While distributions may deprecate and remove them (e.g. sysctl(2)) they certainly won't be assigned new IDs. A table won't help in such cases.
Exactly. This is why, for example, the original 'mmap' system call entry point on x86 still exists, even though it is overwhelmingly likely that every program on your machine is actually going to use the 'mmap2' entry point.
I think you're confusing driver APIs and syscalls. Both are infamous for their respective lack of, or guarantee of stability.
if you're going to implement that kind of overhead why not just use libc?
I'm not sure what you mean by overhead. It's not that hard or expensive to choose a few function pointers on program start.
There is a bsd/kern/syscalls.master file for every kernel at https://opensource.apple.com/source/xnu/