Hacker News new | ask | show | jobs
by buttproblem 3316 days ago
Techniques like this can be used to make debugging better. There's a lot of work towards this direction but there are some difficulties making it's use widespread.

Some interested related work is Delta Debugging. Here, an automated tool runs a set of tests and uses slicing to narrow down statements in the program causing the bug. The gist of this is to look at the statements which occur on every failing execution and then present these to the end user. More tests causing the error will usually lead to a smaller explanation.

The paper here presented a dynamic slicing technique. The problem with dynamic slicing is runtime overhead. The most naive approach is to log the operands and output of every instruction in the program which is quite expensive. Traces for real world programs will probably be many gigabytes. I don't have any numbers off the top of my head. But you could not deploy the instrumented version of the program for slicing to your users without it degrading performance. Intel recently added some hardware tracing to their processors which might offset this.

The alternative is static slicing where the program is not actually executed. Here, you give the slicer the statement of interest (e.g., a line of code where a seg fault occurred) and it will give you back the statements involved in the computation, same as in dynamic slicing. The problem here is accuracy. It's difficult and expensive to know what a program does before executing it (e.g., what memory locations could be accessed by this location? What is the value of the index used to access this array). So, the slice is often too large, or the slicing algorithm is too complicated and fails to terminate in a reasonable time on a real program.

Of course for hard to diagnose bugs you could use techniques such as this, and maybe it would be beneficial. It's hard to way the cost of wrangling with a slicing tool versus just staring at the code and trying to figure it out yourself. There's many more similar debugging techniques which are also quite interesting (e.g., program synthesis to automatically find and repair bugs, using slicing with test case generation (e.g., fuzzing, symbolic execution) to focus towards particular code regions) but these are probably even more research-y and hard to use in practice.

1 comments

>It's hard to weigh the cost of wrangling with a slicing tool versus just staring at the code and trying to figure it out yourself.

I cannot provide figures, but for me, when debugging within code that is not my own, I find that even something as simple as a call tree is often a big help (it is complementary to a stack trace, as you often want to figure out what was going on before the failure.)

My guess is that more sophisticated and powerful tools for understanding software will do more to improve its development than yet another language.

I'm curious about how you use this during debugging, would you mind elaborating on what language(s) you're using the call-graph from?

I'm wondering because I've been studying building automated program reasoning tools and have found that even just creating the call-graph is difficult: I imagine the end result potentially confusing the user because it is either incomplete or overly pessimistic. For example, the LLVM call-graph (apart from not being in the language directly used by the developer) does not include edges from indirect calls because their targets cannot be resolved (so it is incomplete). Alternatively, in the case where you don't know which function is called you can just add an edge to every function in the program (pessimistically).

The same story goes for the dependency graph.

I myself am a bit pessimistic myself about more powerful software understanding tools probably because I've been working on the problem for too long. It's particularly difficult because the languages end up being too difficult to analyze, and there is a huge engineering effort in creating an analyzer which is tuneable to handle various language features efficiently. But, if you have a particular problem and a particular (subset) of the language it ends up working nicely.

I have done this for C code. I will sometimes print off sections of a call tree and make notes on it to keep track of what I have found about what updates what as I work backwards from the point of failure.

I have also used the 'call hierarchy' and 'find usages' features in IntelliJ to help me navigate around Java code.

I appreciate your point that more dynamic and feature-rich languages are much more difficult (if not impossible) to analyze statically.

Call tree and deps graph is mandatory to me.