Hacker News new | ask | show | jobs
by DannyBee 3890 days ago
FWIW: I suspect the problem he's referring to is that the debugging agent layers suck in most languages.

IE given C++, or objective C, what gets described to the debugger by the compiler requires the debugger to know and do a lot to get actual values out of.

For C++, it's actually pretty good except function calls/etc require understanding the ABI. IE the debug info i get tells me "if i want to get the value of this variable, i evaluate this expression". I know the layouts of how to interpret it, etc. It's rare the expression is too complicated (though it may require piecing together registers and memory, etc, it's jsut a state machine).

For Objective C, even things like "instance variables" require the debugger understand a lot. Java has a fairly reasonable agent, etc.

Part of this is that the type systems of the debug info formats (DWARF, etc) are very simple, so even though they theoretically support things like function calls, etc, it's rarely used to provide the functionality necessary, and the debugger is left having to do it itself.

1 comments

It's not just the ABI: the parsing is very different too. A good C++ REPL in a debugger has to understand not only standard C++ expressions (which is a tremendous task itself), but also:

  * module-differentiated references (since foo.dll!globalThing can be different from bar.dll!globalThing) 
  * scope-differential references (each compilation unit can have its own statics)
  * CPU registers (which make perfect sense as variables in interactive debugging)
  * convenience variables
  * preprocessor macros (which we record in DWARF these days)
  * pretty-printers
  * pine number references
  * pompletion
and tons of other things. Using some kind of "agent" embedded in debugged programs as a necessary part of debugging is unacceptable, since you're frequently debugging core files and minidumps and you can't exactly put a question to a corpse. The debugger needs to understand how to do all of this itself.
"It's not just the ABI: the parsing is very different too. A good C++ REPL in a debugger has to understand not only standard C++ expressions (which is a tremendous task itself), but also: "

All true (I maintained c++ support in GDB for years, so i'm sadly aware of most of these issues), but parsing is a user interface issue (IE "What is the user asking me about"), rather than a "how do i actually access the value the user asked me about". You assume, strongly, that the user wants to use the same expressions that exist in their program. Let's assume this is true for a second: Good solutions for this already exist (libclang, etc) in most languages to abstract the "what is the user asking about" part, no good solutions exist for a lot of languages to abstract the "how do i access to the value of that in this implementation"

(This is an "in practice problem". In theory, you could pretty easily extend DWARF to tell me how to call functions in C++, for example).

" Using some kind of "agent" embedded in debugged programs as a necessary part of debugging is unacceptable, since you're frequently debugging core files and minidumps and you can't exactly put a question to a corpse."

First, i'm going to challenge this. It may be true in what you do. However, at least in the development environment in which i function, in C++, debuggers are a tool of last resort (i literally have per-line command logs of what developers where i work do with the debugger).

The number of times they are run on core files is < 5%.

This is >25k developers. Given the vast majority are not debugging core files, ISTM to make more sense to have an architecture targeted at serving these 95% super well, and then handle the 5% of cases differently

(I expect, when you are that screwed, that you may need a different set of tools to be effective anyway, since core files are post-mortem debugging).

Second, you make the strange assumption an agent can't read or work with core files, and needs a live process?

"The debugger needs to understand how to do all of this itself."

You assert this rather than show this.

What stops an agent from having an interface to read from memory (most in fact, do), and the callback lets the debugger give it memory from the core dump or the host?

This is in fact, what already happens in remote debugging of core dumps ....

> You assume, strongly, that the user wants to use the same expressions that exist in their program.

Maybe I wasn't clear --- I think we agree on this point. Debugging parsing is different from (and in some ways, harder than) regular compiler parsing because users want to use familiar syntax that is different from regular program code. I can write print/x $pc+4 --- no C compiler interprets "$pc" to mean the program counter.

While it's true that it's a UI issue, this classification doesn't make the problem any easier.

> you could pretty easily extend DWARF to tell me how to call functions in C++

What extensions would you add? We already have stack-layout information, and the debugger implicitly knows the platform ABI.

> C++, debuggers are a tool of last resort

I've seen this phenomenon too, and it's upsetting: debuggers can be much more efficient. I've put a lot of work into making end-to-end debugging seamless, but I still see developers using traditional in-code tracing.

I want to try making Mozilla's rr available in an equally easy-consumed package and see whether the ability to reverse debugging begins to sway people.

> Second, you make the strange assumption an agent can't read or work with core files, and needs a live process?

I think we mean different things by "agent". I was talking about a remote stub that lives in the process to be debugged. If you instead move that logic to an pluggable component that the debugger merely hosts and that it uses as a general abstraction of how to debug targets of various sorts, debugging core files is feasible. (But in that case, how is it different conceptually from struct target_ops, which we already have?)

"What extensions would you add? We already have stack-layout information, and the debugger implicitly knows the platform ABI."

But it does not know the C++ ABI.

Here is the minimum amount the random crap GDB currently has to understand, on it's own, about the GNU v3 C++ ABI: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=b...

(There's more, it's just not all in this file :P)

In an ideal world, the debugger should need to know none of this. It should be part of the debug info.

If you use a compiler's front end to parse the debugger's expressions, you can use a modification of the front end's constant expression evaluator, wherein it retrieves values and executes functions by asking the debugger.

(The Delphi IDE integrates its compiler with the debugger in this fashion.)