Hacker News new | ask | show | jobs
by jcranmer 1203 days ago
That sort of argument is somewhat defensible in a context like the kernel, where when things go haywire, you can't really expect there to be enough sanity to have a debugger work. But very little code runs in such a context, and it turns out that a well-written debugger has incredible features.

Also, Linus is writing this 22 and a half years ago, where the capabilities of debuggers were... far, far less. Time-travelling debuggers is really a game changer, just having the ability to travel back in time to figure out who set the value that causes the code to crash. Hot reload is also a wonderful thing (unfortunately, the fragmentation of tooling in Linux makes getting this working properly very difficult).

2 comments

Lots of (most?) real-time and embedded code run in a state where suspending/resuming really doesn't work in any useful fashion, so it's logging, and minimal logging at that to figure out what's going wrong in-situ.

That said, much benefit is gained by writing the complicated bits in such a way that they can be tested/debugged/examined independently on a host system.

Hmmm I have 20+ years of embedded sw programming experience and can tell you the reason that embedded software is oftentimes not easily debugged using a debugger, is the crappyness of the debugger solution (debug probe, its firmware and its eco system). Also, high end embedded ICs often contain a serious amount if silicon bugs. The fact that it needs to run real-time, is mostly not in the way of the debugging process. In other words, embedded processors and their eco systems tend to be sub-par in terms of developer friendlyness. Addendum SHARC ADSP anomaly list https://www.analog.com/media/en/dsp-documentation/integrated...
I'm with you there!

Be happy if you can flash a led... happier if you have two speeds... and Nirvana if it is a multi-color led.

Intense jealousy of your colleagues who have TWO leds on their boards!

> Time-travelling debuggers is really a game changer

Core dumps have existed forever. They give you a stack trace and register values at the time of crash. Even better, you don't need a debugger running at the time of crash and you can dig into dumps sent from nontechnical users.

Sure, Bret Victor's demo was cool. But time travel debugging is so completely oversold at this point that I can't take anyone seriously that mentions it.

I've debugged core dumps before. It's not been a particularly pleasant experience--good luck trying to do something like `call V->dump()` (dump out an easy-to-understand representation of a complex value to stdout... oh that doesn't exist anymore, can't use that functionality!)

The most useful aspect of time-travel debugging for me, personally, has been when the test case that causes a crash is refusing to be reduced, and the function that crashes does so on like the 453rd time it's called. Jumping straight to the crash, then reverse-continuing to a break point cuts out so much time of debugging (especially because it saves you if you accidentally continue the breakpoint one too many times; otherwise, you'd have to start the entire, tedious process from the beginning).

Something else that time-travel debugging helps a lot with, is that in an awful lot of cases, what you have in the crash dump is a broken state, with no way to identify how that state happened to be. Like "why the hell does this variable have this value?". Sure, you can do the whole digging work to find what can possibly change that variable, and try different scenarios to hit them in a debugger at the moment the problem might appear in a new session that is not even guaranteed to produce the same crash. But with record-and-replay type things, you just set a watchpoint on the value, continue backwards, and there you go, you find where the value comes from.

Now, imagine you did do that manually and spent a lot of time finding that location. In many cases, that only gets you one step closer to the root cause, and you have to repeat the operation multiple times. Yes, you _can_ do that without record-and-replay, but do you really want to? Do you want to spend hours doing something that could take you minutes?

(And that's not even mentioning even worse cases, where the value you're tracking goes between processes via IPC)

Have you ever used time-travel debugging? Core dumps give you a backtrace (assuming stack is sane) which gives some clue of how I got here, but not _why_ I got here. e.g. assertion failure. I wrote that assertion because I believed this thing would always be true. Now I find it is not true. From a corefile I can't usually see why it not true.

With a time travel recording I can put a watchpoint (aka data breakpoint) on the state that is supposedly impossible, and reverse-continue back to see exactly where it got set. (And repeat as required.) It really is very powerful.

Admittedly there are situations in which it's not practical to get a recording, but when you can... almost any bug becomes trivial.