Hacker News new | ask | show | jobs
by tptacek 2013 days ago
Most examples of BPF code are written in a mix of Python and C using BCC, the "BPF Compiler Collection", which essentially treats all of LLVM and clang as a library callable from Python code.

I can't get my head around using it that way, and have found it pretty straightforward to just write C programs, compiled with clang `-target bpf`. Until very recently, writing anything interesting this way required you to declare all functions inline, compile into a single ELF .o, and, of course, avoid most loops. But most of the kinds of things you'd write in BPF tend not to be especially loopy (you can factor most algorithmic code out into userland, communicating with BPF using maps).

A big issue for this kind of development is kernel compat; struct layouts can change from release to release, for instance. This isn't a problem for us at Fly, because we just run the same kernel everywhere, but it's a real problem if you're trying to ship a tool for other people's systems. But that's changing with CO-RE; recent kernels can export a simplified symbol table in a BPF-legible format called BTF, and the leader can perform relocations. Facebook has written a bunch of good stuff about this:

https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc...

6 comments

There's also https://github.com/alessandrod/bpf-linker to make compiling a bit easier, as it does necessary inlining at link-time.
I think dtrace has the same problem, i.e. it's pretty tightly coupled to the exact functions / trace points in the kernel. A different kernel can break a dtrace script, although I think their code changes a lot less than Linux does.

It seems somewhat unavoidable, if the goal is to introspect the kernel at a very intimate level ...

No, DTrace does not have this problem, though our solution to it is one of the least well known aspects of DTrace: we have a notion of explicit stability that allows for stable scripts to be built on top of very low level implementation details that themselves might change. See the chapter on "Stability" in the Dynamic Tracing Guide[1] for details.

[1] http://dtrace.org/guide/chp-stab.html

> Stability attributes are computed for most D language statements by taking the minimum stability and class of the entities in the statement.

That's a fascinating read and an amazing idea. To your knowledge are there any other software ecosystems that track stability in nearly as formalized a way? Has there been investigation into bringing these ideas into other modern languages? (I don't believe Rust has a concept like this, for instance, though it would even further strengthen the language's concept of correctness if it did!)

Well, we certainly thought it was a big deal! We were really trying to address this issue of writing stable scripts -- allowing for stable, powerful tooling without ossifying the underlying system. I'm really pleased with the work we did there -- but it's unquestionably one of the more esoteric aspects of DTrace.

Something of a funny story that this brings to mind: the taxonomy we have here is actually the interface taxonomy from what was Sun's Platform Software Architecture Review Committee (PSARC), which itself borrowed it from Sun's larger Software Development Framework (SDF). We had to get DTrace reviewed by PSARC, which we weren't necessarily looking forward to -- in part because of big developments like this one. To get past our PSARC review, we adopted several strategies, one of which was to separate out DTrace from its instrumentation providers as separate cases before the committee. When we first presented DTrace to PSARC, committee members wanted to fixate on instrumentation methodology -- and it was very helpful to be able to defer these fixations to later cases (after having let members pontificate and chew up some of the clock, of course). The other technique that we developed (which was devastatingly effective) was to distract the committee with issues that were irrelevant but amenable to debate. When a debate emerged among the committee members (and PSARC being more or less a debating society, this was practically guaranteed), we would effectively feed both sides of the debate -- and in the end, run out the clock on something we didn't care about. All of this worked exceedingly well -- and DTrace itself (one of the largest cases that had ever appeared before PSARC) was approved with essentially no modifications.

Shortly after the DTrace case was approved, we started bringing forward cases on instrumentation providers. With each case, we presented the stability matrix of that particular provider; on the first such case, I remember vividly one committee member asking: "what the hell is this and when do we review it?!" We explained that it was the stability matrix -- as explained at length in the case that they had in fact already approved. They realized in an instant that they had fixated on a dinghy of nomenclature while we had slipped behind them an ocean liner of semantics -- and it was glorious.

> The other technique that we developed (which was devastatingly effective) was to distract the committee with issues that were irrelevant but amenable to debate.

It's not exactly the same but this reminds me of the way Matt Stone described his interactions with the MPAA board in This Film Is Not Yet Rated (https://en.wikipedia.org/wiki/This_Film_Is_Not_Yet_Rated).

i.e. they went into the Team America rating negotiation with aggressive material they were prepared to cut, and probably wanted to cut anyway, and let the committee spend all their time on that.

See also (NSFW):

https://youtu.be/SgyG8y1vg1M?t=151

https://lettersofnote.com/2009/09/30/p-s-this-is-my-favorite...

Thanks for the reference... although it seems clear upon reading it that the problem is mitigated (by tagging/documentation) but still present. Although the tool support looks nice and is probably something that could be borrowed for eBPF.
Off topic, but thanks for being so inspiring ;)
BTF is designed to avoid the problem, by having the live kernel export symbols; supposedly --- I haven't used it --- the toolchain even converts it to a header file, so BPF programs just include a "vmlinux.h" instead of includes pointing into kernel source (which is a nightmare). It's ambitious and I'm surprised it does as much as it does but apparently they're solving this problem.
I can imagine it leading to ossification of kernel internals.

Imagine when someone comes up with a revolutionary new paging method, but it causes everyone's eBPF scripts to fail to load and a bunch of tools to break...

What's worse is that I've run into Kernel bugs/panics a few times that made me hesitate recommended BPF for production systems. Hopefully those become less frequent as the ecosystem matures, but they were pretty scary!
Handling input options and displaying output is a little easier in python. It also let's you hack the tools quick and run any changes instantly.
CO-RE is great, but for those who have to run on older kernels an approach is to loop, guessing the offset and running an experiment to see if correct:

https://github.com/weaveworks/tcptracer-bpf/blob/cd53e7c84ba...

This was done (by Kinvolk) for the visualisation tool Weave Scope; also picked up by DataDog https://github.com/DataDog/datadog-process-agent/tree/master...

I got a bunch of Numba version related errors (Python 3.7) when I tried to run the example code in the website and my thoughts were in the same direction. Was wondering if it is possible to write something like this in, say, Golang instead of Python.
There are Go bindings for BPF and BCC: https://github.com/iovisor/gobpf

I'm not sure the state of them at this point, but it's the same paradigm GP mentioned.

I think the more idiomatic thing to use in Go is Cilium, which has tooling support for loading and attaching eBPF programs, and also a weird embedding system that calls clang9 directly.

I find the Cilium libraries sort of hit-or-miss† but they mostly work well, but, again, I just build my BPF programs themselves with Makefiles into .o's, and use Cilium (or, for XDP/TC, iproute2) to load them.

https://twitter.com/tqbf/status/1336825568478834689