Hacker News new | ask | show | jobs
by mike_hearn 849 days ago
It's a sub-component but Oracle Labs has a project to develop something like the FaaS platform he's asking for, called GraalOS.

The basic idea is that FaaS is a leaky abstraction because (a) lots of runtimes are slow to start up and (b) isolation tech isn't good enough. So FaaS services start up VMs and containers and then the user's function which might have to do a lot of init work, like to load reference data, and because that takes too long you have to keep idle capacity around. At that point the abstraction is broken.

So there's a two-part fix:

1. For Java users, the GraalVM native-image tool can pre-initialize and pre-compile a JVM app so that it starts up instantly (including with pre-loaded reference data).

2. Change the isolation model so VMs and containers don't need to be started up anymore. Containers alone can take hundreds of milliseconds to start.

There's also some interesting stuff there that takes advantage of Oracle Cloud's more "edgey" nature than other clouds, where it has more datacenters than others (but smaller).

The new isolation model works by exploiting new hardware features in CPUs that allow for intra-process memory isolation (Intel MPK) combined with hardware-enforced control flow integrity. This requires compiler support, but GraalVM knows about these features and so the cloud can just compile JVM apps to native for you. And what about other apps? Well, many languages run on GraalVM via Truffle, so those are covered (e.g. JavaScript) and for native code you can use a modified LLVM to compile and then do a static verification of any user supplied binaries, like NaCL used to do.

If you put those things together then starting user code that's already available locally becomes just mmapping a shared library into a process, which is extremely fast. It can only exit the hardware/software enforced isolate by going via a trampoline that's equivalent to a syscall, but without needing an actual syscall. The Linux kernel isn't reachable at all.

With that you can have functions that start and stop in milliseconds.