Hacker News new | ask | show | jobs
by matheusmoreira 1760 days ago
Modern Linux user space is monstrous and labyrinthine. Linux itself is not. I found it to be a really clean system.

The basis of everything is actually the Linux system call binary interface. We can actually trash the entire user space and start from scratch with nothing but Linux. We can even trash the ubiquitous GNU stuff if we want.

Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters. No need for libc bullshit in the middle. No need for C or any specific language. Someone could make a language today and that single feature would make it as capable as C is for systems programming. They could write software and boot Linux directly into it.

2 comments

> Why can't we have a compiler with built in system call support?

Funny you should ask that. That is exactly how Virgil's compiler supports the Linux (and Darwin) kernels. Other than generating a small amount of startup assembly (10-20 ins), the compiler just knows the ELF (and MachO) binary formats and the calling conventions of the respective kernels. With some unsafe escape hatches (e.g. getting a pointer into the middle of a byte array), the rest is regular Virgil code that calls the kernel directly.

Take a look, I've been working on this for more than 10 years:

https://github.com/titzer/virgil/blob/master/rt/x86-64-linux...

The "Linux.syscall" is a special operator know to the compiler and it will let you pass an int (the syscall number) and whatever arguments you want (any types--it is implemented with flattening and polymorphic specialization) to the kernel.

With this I have implemented all kinds of stuff, including the userspace runtime system and even a JIT compiler (for my new Wasm engine).

Thanks for this, it's extremely awesome! Really happy to see others have gone so much farther than I ever did.

I started looking into this myself some years ago. Even started developing a liblinux with process startup code and everything. Abandoned it after I found the kernel itself had an awesome nolibc.h file that was much more practical for my C programming needs:

https://elixir.bootlin.com/linux/latest/source/tools/include...

My code is in a bad state but if you'd like to take a look:

https://github.com/matheusmoreira/liblinux

It's amazing how this really lets you do everything... Want a JIT compiler? Map some executable pages and emit some code. You can statically allocate memory at process startup and use that for bootstrapping code. This lets you implement dynamic memory allocation and even garbage collection in your own language.

Nice work!

> Want a JIT compiler? Map some executable pages and emit some code.

Yep, this is exactly what Wizard does.

Targeting POSIX standard functions like open by going through the Linux syscall table looks like just making work for yourself when porting this to other systems.

Some syscalls don't correspond to standard library functions. As an exmaple, if you want to bind to opendir/readdir/closedir, you have to write those yourself in terms of the Linux-specific _NR_getdents64 system call.

Is your LinuxConst.SYS_open actually _NR_open? That's supposedly obsolete. glibc uses _NR_openat for open(). _NR_open is listed in the asm/unistd.h header in a section under the heading "All syscalls below here should go away really ..."

How about signal handling; are you dealing with sigreturn and all that?

You can get a small executable footprint (in terms of not requiring a dynamic C library) by maintaining all this yourself, though.

Oh, I know it's work, but I am not going to assume POSIX, as that's implemented in userspace with C code. In my universe, C code doesn't exist (except I use a little in some testing utilities in order to get going on a new platform). I never ported to Windows, but doing so would be as simple as teaching the compiler the Windows kernel calling convention, adding that little process entry code, and then writing an implementation of System using Windows calls. Oh, yeah, and generating COFF :)

Virgil has its own calling convention internally (though this is basically System V on x86-64). That only matters when getting into V3 code or out, e.g. process entry, calling the kernel, and signal entry. For signals, the compiler generates a tiny stub that copies the signal handler arguments into the V3 registers and then calls user code. To install signals, user code just needs to fill out the right sigaction buffer, as any other system call. To return from signals properly, I studied assembly examples I found online. The runtime doesn't use signals for anything other than catching fatal errors (DivideByZero and NullCheck), so it just prints a source-level stacktrace and then exits. But Wizard needs to recover from signals in order to do proper OOB handling of user programs, so it actually does the proper sigreturn dance, but Wizard only does the fancy stuff on 64-bit.

In my universe, only three things exist: Virgil, wasm, and machine code. I have no need of other languages except as means to test those others :)

Virgil runs on the JVM and on Wasm too, and those require slightly different ways of getting off the ground.

> porting this to other systems

Why care about this? I want Linux on everything instead.

> Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters

It can be implemented as a small function, that first appeared in 4 BSD. It's available in Linux.

$ man syscall

(Unfortunately, this function, lives in glibc. Obviously, though, it doesn't have to. All I'm saying is that this, or a similar function, can be a linkable symbol in some small compiled object file, and not an inline primitive that has to live in the compiler.)

You're of course correct about all this. I believe the glibc thing has created mainly cultural problems. People don't look at Linux as a separate thing.

If I look up Linux system calls on Wikipedia I get diagrams showing glibc wrapping the Linux system call interface because that's what you're supposed to be using. If I look at Linux man pages what I really get is glibc man pages with the actual system calls being almost an afterthought. Glibc wrappers actually do a ton of stuff like add cancellation mechanisms. Glibc also drops support for system calls that break their threading model such as clone.

It's the same problem with systemd. I look up Linux init system man pages and get systemd stuff instead. I expected to see kernel APIs useful for writing my own.

This isn't any different from any current or historic Unix. The system interface has been a C library going back to early Unix.

The library and kernel interface are more separated in Linux systems than in prior Unixes, with user space C libs being totally separate projects.

Over a Linux kernel you can find glibc, ucLibc, musl, Android s Bionic (newlib derivative from BSD), ...