Hacker News new | ask | show | jobs
by deckard1 1202 days ago
> Time-travelling debuggers is really a game changer

Core dumps have existed forever. They give you a stack trace and register values at the time of crash. Even better, you don't need a debugger running at the time of crash and you can dig into dumps sent from nontechnical users.

Sure, Bret Victor's demo was cool. But time travel debugging is so completely oversold at this point that I can't take anyone seriously that mentions it.

2 comments

I've debugged core dumps before. It's not been a particularly pleasant experience--good luck trying to do something like `call V->dump()` (dump out an easy-to-understand representation of a complex value to stdout... oh that doesn't exist anymore, can't use that functionality!)

The most useful aspect of time-travel debugging for me, personally, has been when the test case that causes a crash is refusing to be reduced, and the function that crashes does so on like the 453rd time it's called. Jumping straight to the crash, then reverse-continuing to a break point cuts out so much time of debugging (especially because it saves you if you accidentally continue the breakpoint one too many times; otherwise, you'd have to start the entire, tedious process from the beginning).

Something else that time-travel debugging helps a lot with, is that in an awful lot of cases, what you have in the crash dump is a broken state, with no way to identify how that state happened to be. Like "why the hell does this variable have this value?". Sure, you can do the whole digging work to find what can possibly change that variable, and try different scenarios to hit them in a debugger at the moment the problem might appear in a new session that is not even guaranteed to produce the same crash. But with record-and-replay type things, you just set a watchpoint on the value, continue backwards, and there you go, you find where the value comes from.

Now, imagine you did do that manually and spent a lot of time finding that location. In many cases, that only gets you one step closer to the root cause, and you have to repeat the operation multiple times. Yes, you _can_ do that without record-and-replay, but do you really want to? Do you want to spend hours doing something that could take you minutes?

(And that's not even mentioning even worse cases, where the value you're tracking goes between processes via IPC)

Have you ever used time-travel debugging? Core dumps give you a backtrace (assuming stack is sane) which gives some clue of how I got here, but not _why_ I got here. e.g. assertion failure. I wrote that assertion because I believed this thing would always be true. Now I find it is not true. From a corefile I can't usually see why it not true.

With a time travel recording I can put a watchpoint (aka data breakpoint) on the state that is supposedly impossible, and reverse-continue back to see exactly where it got set. (And repeat as required.) It really is very powerful.

Admittedly there are situations in which it's not practical to get a recording, but when you can... almost any bug becomes trivial.