Hacker News new | ask | show | jobs
by turtledragonfly 1202 days ago
One thing that can be pretty nice which is kinda neither traditional debugging nor logging is DTrace (or similar). Basically event tracing on steroids. Maybe EBPF is in that vein? I don't have much personal experience with it, but I have heard some stories of good success on busy production systems.

I guess my (limited) experiences with distributed systems are different than yours. The notion of "pausing" the system to step through things interactively was usually untenable. Do you stop the one node that shows the issue, and let the others run, getting into who-knows-what shared state? Do you somehow attempt to stop them all, and hope/pray that they all are in the right state to make your cross-node analysis meaningful? This was mostly on Apache Spark, where parallelism was the name of the game. Maybe for some kind of long-running distributed system like Erlang it's a different story.

1 comments

> Maybe EBPF is in that vein?

Not just that :) It was "inspired by". Well, it's the same idea.

> The notion of "pausing" the system to step through things interactively was usually untenable.

That's not what EBPF would be used for in such a system. You'd write a bit of code that can be loaded into a running program and executed as a particular condition occurs. Like how you can attach some code to evaluate on a breakpoint in many other debuggers.