|
|
|
|
|
by crabbone
1202 days ago
|
|
Both interactive and declarative debuggers work better in distributed systems than logging because they can observe events as they happen, and don't need to recreate the order in which they happened from the records which are very hard to make chronologically consistent. Things like EBPF (which may implement sort of a declarative debugger) are, perhaps the only tool you may hope to use in high volume and high frequency systems. If I could only choose one technology used for software diagnostics, I'd choose debuggers over logging. Debuggers need more effort to develop them, and they aren't very good (yet), but they have potential. I don't believe that logging can be substantially improved to deal with difficult problems. |
|
I guess my (limited) experiences with distributed systems are different than yours. The notion of "pausing" the system to step through things interactively was usually untenable. Do you stop the one node that shows the issue, and let the others run, getting into who-knows-what shared state? Do you somehow attempt to stop them all, and hope/pray that they all are in the right state to make your cross-node analysis meaningful? This was mostly on Apache Spark, where parallelism was the name of the game. Maybe for some kind of long-running distributed system like Erlang it's a different story.