Hacker News new | ask | show | jobs
by titzer 1763 days ago
> Why can't we have a compiler with built in system call support?

Funny you should ask that. That is exactly how Virgil's compiler supports the Linux (and Darwin) kernels. Other than generating a small amount of startup assembly (10-20 ins), the compiler just knows the ELF (and MachO) binary formats and the calling conventions of the respective kernels. With some unsafe escape hatches (e.g. getting a pointer into the middle of a byte array), the rest is regular Virgil code that calls the kernel directly.

Take a look, I've been working on this for more than 10 years:

https://github.com/titzer/virgil/blob/master/rt/x86-64-linux...

The "Linux.syscall" is a special operator know to the compiler and it will let you pass an int (the syscall number) and whatever arguments you want (any types--it is implemented with flattening and polymorphic specialization) to the kernel.

With this I have implemented all kinds of stuff, including the userspace runtime system and even a JIT compiler (for my new Wasm engine).

2 comments

Thanks for this, it's extremely awesome! Really happy to see others have gone so much farther than I ever did.

I started looking into this myself some years ago. Even started developing a liblinux with process startup code and everything. Abandoned it after I found the kernel itself had an awesome nolibc.h file that was much more practical for my C programming needs:

https://elixir.bootlin.com/linux/latest/source/tools/include...

My code is in a bad state but if you'd like to take a look:

https://github.com/matheusmoreira/liblinux

It's amazing how this really lets you do everything... Want a JIT compiler? Map some executable pages and emit some code. You can statically allocate memory at process startup and use that for bootstrapping code. This lets you implement dynamic memory allocation and even garbage collection in your own language.

Nice work!

> Want a JIT compiler? Map some executable pages and emit some code.

Yep, this is exactly what Wizard does.

Targeting POSIX standard functions like open by going through the Linux syscall table looks like just making work for yourself when porting this to other systems.

Some syscalls don't correspond to standard library functions. As an exmaple, if you want to bind to opendir/readdir/closedir, you have to write those yourself in terms of the Linux-specific _NR_getdents64 system call.

Is your LinuxConst.SYS_open actually _NR_open? That's supposedly obsolete. glibc uses _NR_openat for open(). _NR_open is listed in the asm/unistd.h header in a section under the heading "All syscalls below here should go away really ..."

How about signal handling; are you dealing with sigreturn and all that?

You can get a small executable footprint (in terms of not requiring a dynamic C library) by maintaining all this yourself, though.

Oh, I know it's work, but I am not going to assume POSIX, as that's implemented in userspace with C code. In my universe, C code doesn't exist (except I use a little in some testing utilities in order to get going on a new platform). I never ported to Windows, but doing so would be as simple as teaching the compiler the Windows kernel calling convention, adding that little process entry code, and then writing an implementation of System using Windows calls. Oh, yeah, and generating COFF :)

Virgil has its own calling convention internally (though this is basically System V on x86-64). That only matters when getting into V3 code or out, e.g. process entry, calling the kernel, and signal entry. For signals, the compiler generates a tiny stub that copies the signal handler arguments into the V3 registers and then calls user code. To install signals, user code just needs to fill out the right sigaction buffer, as any other system call. To return from signals properly, I studied assembly examples I found online. The runtime doesn't use signals for anything other than catching fatal errors (DivideByZero and NullCheck), so it just prints a source-level stacktrace and then exits. But Wizard needs to recover from signals in order to do proper OOB handling of user programs, so it actually does the proper sigreturn dance, but Wizard only does the fancy stuff on 64-bit.

In my universe, only three things exist: Virgil, wasm, and machine code. I have no need of other languages except as means to test those others :)

Virgil runs on the JVM and on Wasm too, and those require slightly different ways of getting off the ground.

> porting this to other systems

Why care about this? I want Linux on everything instead.