Hacker News new | ask | show | jobs
by crabbone 1202 days ago
Both interactive and declarative debuggers work better in distributed systems than logging because they can observe events as they happen, and don't need to recreate the order in which they happened from the records which are very hard to make chronologically consistent.

Things like EBPF (which may implement sort of a declarative debugger) are, perhaps the only tool you may hope to use in high volume and high frequency systems.

If I could only choose one technology used for software diagnostics, I'd choose debuggers over logging. Debuggers need more effort to develop them, and they aren't very good (yet), but they have potential. I don't believe that logging can be substantially improved to deal with difficult problems.

1 comments

One thing that can be pretty nice which is kinda neither traditional debugging nor logging is DTrace (or similar). Basically event tracing on steroids. Maybe EBPF is in that vein? I don't have much personal experience with it, but I have heard some stories of good success on busy production systems.

I guess my (limited) experiences with distributed systems are different than yours. The notion of "pausing" the system to step through things interactively was usually untenable. Do you stop the one node that shows the issue, and let the others run, getting into who-knows-what shared state? Do you somehow attempt to stop them all, and hope/pray that they all are in the right state to make your cross-node analysis meaningful? This was mostly on Apache Spark, where parallelism was the name of the game. Maybe for some kind of long-running distributed system like Erlang it's a different story.

> Maybe EBPF is in that vein?

Not just that :) It was "inspired by". Well, it's the same idea.

> The notion of "pausing" the system to step through things interactively was usually untenable.

That's not what EBPF would be used for in such a system. You'd write a bit of code that can be loaded into a running program and executed as a particular condition occurs. Like how you can attach some code to evaluate on a breakpoint in many other debuggers.