Hacker News new | ask | show | jobs
by hackworks 2389 days ago
eBPF can be viewed as a mechanism to safely run user code in kernel since it uses a DSL and a compiler before the byte code is executed in kernel. This opens up doors for running performance critical functionality in kernel without having to bundle it with the kernel or very tightly coupled with the kernel version.

Optimizing FUSE is an example: https://extfuse.github.io/

I expect custom security auditing software, reverse proxies, firewalls with rule engines implemented in eBPF in the coming future. This will avoid having to copy dates across kernel and user boundary and the switching overheads.

5 comments

According to https://news.ycombinator.com/item?id=18496054, these programs have to halt? How does this system guarantee that the programs halt? Does this mean eBPF is not Turing complete?
The language itself is Turing complete, but the kernel will refuse to run a program that it cannot prove will halt.

There are three categories of programs; programs you can trivially prove will halt, programs you can trivially prove won't halt, and programs where it's difficult or impossible to prove whether or not will halt. The third category is what we call the halting problem. Only the first category will be run by the kernel.

The nitty gritty is that the only jumps which are allowed to go backwards are the end of a loop with a fixed number of iterations. All other jumps must jump forward.

This sounds like it's really limiting, but in practice there's not a whole lot of useful stuff you're unable to do.

Could you create an unbounded loop by scheduling another BPF execution on the end of each program? I imagine something like a WASM runtime that would be split into multiple BPF programs on loops. Which would practically achieve https://github.com/nebulet/nebulet, right?
This does actually mean that eBPF is not Turing complete, in that it can't simulate a Turing machine.

With a fixed number iterations (which also means fixed amount of memory) it's really just a finite state automata.

From the accompanying video (17:11)[1] it's not Turing complete "because the verifier rejects unbounded loops and so there is some things that are not possible. We do have bounded loops in BPF now."

1: https://youtu.be/7pmXdG8-7WU?t=1031

Commonly known as total functional programming.

https://en.m.wikipedia.org/wiki/Total_functional_programming

Indeed, eBPF is not turing complete.
It's the definition of Turning complete that's the problem. A Turning complete machine must be able to process countably infinite inputs. The halting problem arises from same thing. There is no halting problem if we restrict ourselves to programs that will stop in a bounded time (obviously).

In practice that means a true Turning machine can compute things that aren't actually computable in our Universe, or at least not by the bits of the Universe we understand well. I doubt any engineer will think that "this architecture can't in theory compute things not computable in this Universe" is a strong argument against it.

> firewalls with rule engines implemented in eBPF in the coming future

Wasn't this the original purpose of BPF?

Bytecode + compiler was also how both Flash and Java Applets worked. Have we forgotten how secure those were?
Java implemented the sandbox as java code in the same VM as what it was sandboxing, then later attached reflection so you had a billion different ways to get references to those classes blocking your access and do brain surgery on them. Oof.

Flash was a lovecraftian horror novel of a code base that even Adobe didn't feel totally comfortable making changes in.

It's also how wasm works, and in some sense -- JavaScript. Somehow, these introduce much less security problems.
https://github.com/tunz/js-vuln-db

I'm not sure Javascript has fewer security problems at all. Sandboxes help a lot, but not all sandboxes are equally strong and Javascript engines routinely get popped.

JavaScript doesn’t give you nearly as much access to host OS features as the JVM does. (And there sometimes are security problems when it does.)
But it's not really interesting to compare js to java, we should be comparing to ebpf.
The 'somehow' seems rather clear to me when you look at how the organizations who governed these languages and how they're run.
One difference with eBPF is that it limits the number of instructions you can execute (and also prohibits backward branches if I recall - "loops" must be unrolled) and that the semantics are restricted enough that it can be "proven" to be safe.

You are correct however that allowing arbitrary eBPF code to be installed by untrusted users increases the attack surface. For example, we need to trust that the eBPF checker is correct, as well as the implementation of the eBPF operations. Moreover, even though eBPF may be safe I suspect it could potentially trigger or exploit bugs in internal kernel subsystems that might be harder or even impossible to trigger through the regular system call interface.

Another example is a sandboxing file system https://lwn.net/Articles/803890/
I was thinking about something similar to extfuse, but for a remote filesystem. Here's some note scraps:

> Server abilities can be changed by uploading JS functions/libs..

> Want to search a file formats meta-data on the server? Upload JS to read the meta-data and index it, and provide the search options..

> Servers should also use the best options for the tasks they provide: Grep to seach text -- and not Grep like functionality, but the optimized binary for the platform the server is running.

> Locking, linking, ACLs, and binary access should all be changable by JS.

This is an interesting idea. Sort of the inverse of a web app. Part of the problem with the cloud for large datasets (ie genomics) is getting the computation close enough to the data (the UI being the third leg of the stool). If you could upload small processing scripts (or ebpf/wasm) to the exact node where the data lives in real time, it might open up some novel techniques.

Kind of like current serverless tech but instead of running at the edge, you run on the storage node.

EDIT: removed statement about network speed vs ssd speed. Pretty sure I was way off.

I believe this was a huge strength of MapReduce back in the day. The mapping and reducing of initial data would happen on the node the data was stores in. The only thing being transferred was the code and the results.