| HN Mirror

There are at least two issues here.

First: eBPF code is JIT'd in the kernel; at runtime, it is simply native code running at CPL0 alongside the rest of the kernel. Running eBPF code working with pointers is... working with raw pointers. There's no interpretation layer to bounds check or otherwise provide safety.

But eBPF code is meant to be safe: you can get a handle to some kernel structure that's passed to you from trusted code, but you can't bounce from it to a random offset in kernel memory. The way eBPF does this is by verifying the CFG of your eBPF program before it's translated to amd64 or arm. eBPF programs are generally just C programs (the simplest and best way to write an eBPF program is just to write a C program and compile it with the right LLVM flags), and verifying C programs is a hard problem; eBPF gets around this by only accepting a subset of all possible programs (those where memory accesses are simple enough to prove safe, that don't jump anywhere outside of known narrow range of program text, and that don't have unbounded loops).

The tricky thing here is that the eBPF verifier is pretty complicated and lives only in the kernel. People have found bugs in it. If you find a good verifier bug, you can launder an untrusted pointer into your eBPF program (in the end, these bugs end up looking sort of like the browser Javascript RCEs that finagle a bad pointer out of some part of the browser API).

The biggest mitigating factor for these bugs is that Linux systems generally don't expose eBPF to any user other than root, so the upside to these kinds of bugs is limited (it gives you root->kernel, which is not nothing, but not the top of most people's priority list).

The other big issue is that eBPF is a huge source of in-kernel flexibility about runnable code. Modern exploit mitigations are in large part about making sure that instructions running at CPL0 are all known, so that if you manage to corrupt allocator metadata or write an arbitrary 8 byte value at an arbitrary 8 byte offset you can't easily turn that into remote code execution. But, of course, eBPF is an in-kernel JIT; it's there to run essentially random code inside the kernel. eBPF code is normally constrained, but if you have a kernel memory corruption bug, you can aim it at the eBPF subsystem and violate the kernel's assumptions.