Hacker News new | ask | show | jobs
by aseipp 1097 days ago
eBPF verification was always a laugh from the very beginning design stages, if you ask me, because as this paper demonstrates, it was never going to be enough. Anyone with a modicum of security or PLT experience could have told you this when evaluating the design and history. Like, if I had to be completely honest, the very fact the security/robustness model started on principles like "fixed number of loop iterations" or "no backedge jumps" (among several others) in the verifier was a pretty good sign that this was always going to be a source of continuous vulnerabilities. It makes me think people are flying blind. If you're not systematically fixing these issues in the very design stages of the system, and using duct tape, you're just going to patch every single thing one by one as it happens, and then how is that any different from today?

The basic idea is simple. You have the verifier, and the TCB. The verifier has to reject invalid programs, so the TCB does not have its integrity compromised by the program. The verifier is small, so it can be audited. That's nice -- until you back out and realize the TCB is "the entire linux kernel and everything inside of it and all of the surface area API between it and the BPF Virtual Machine" and it dawns on you that at that point the principle of "system integrity being maintained" relies very little on the verifier and actually a whole lot on Linux being functionally correct. Which is where you started at in the first place. The goal of eBPF after all isn't just to burn CPU cycles and return an integer code. It has to interact with the system. Having the TCB functionally be "every line of code we're trying to protect" is the Windows 3.1 of integrity models.

Now, this might also be OK and quantifiable to some extent. Except for the other fact that the guiding design principle in Linux is to pretty much grow without bound, without end, rewrite code left and right, and the eBPF subsystem itself has been endlessly tacking on features left and right for what -- years now?

If you take away any of these three things (flawed design basis, ridiculously large TCB, endless and boundless growth) and modify or remove one of them, the picture looks much better. Solid basis? You can maybe handle the other two if you're careful and on top of things, big hand waive. Very small TCB? Great, you can put significantly more trust in the verifier, freeing you from the need to worry about every line of code. No endless growth? Then you have a target you can monitor and maybe improve on e.g. reduce trends downward over time. But the combination of all three of these things means that the end result is "greater than the sum of the parts" so to speak and it will always be a matter of pushing the boulder up the hill every day, all so it can fall back down again.

That said, eBPF is really useful. I get a ton of value out of it. The verifier does allow you to have greater trust in running things in the kernel. In this case, doing something is quite literally 1,000% better than doing nothing in this if you ask me, at least for most intents and purposes. So making it safer and more robust is worthwhile. But it was pretty easy to see this sort of stuff from a long way out, IMO.

1 comments

The original BPF model was "no backedge jumps, constrained memory model", and its track record is quite good. Say more about why you think "no backedge jumps" --- which isn't the current, more-sophisticated, harder-to-understand verification model --- was obviously weak.