Hacker News new | ask | show | jobs
by nijave 1203 days ago
>can step through the code backward and rewind the program state to any point in history

This seems impossible for anything that modifies external state.

Say 1. Open database connection 2. Commit data 3. Close database connection

How would you rewind right before #2 if you've already completed step #3? You'd need the socket/connection you already closed

Unless that means something more like "keep a running record of program state over time"

9 comments

It can work but it depends on the system. One field in which it's very useful is game development. I'm working on a language that has time travel debugging as a feature, so you can rewind a game back to a previous state. I've found it useful when there's a momentary bug that would be hard to recreate. With time traveling, you pause, rewind, inspect state, fix the bug in situ, then resume from that point to check that the behavior is correct.

Here's an example: http://docs.mech-lang.org/#/examples/bouncing-balls.mec

If you want these kinds of features in other systems, they'll have to be architected to support them. For example, the external database will have to be rewound as well. If it doesn't support that feature, then cool language-level debugging features won't be as useful.

Have a look at https://rr-project.org/ (features and motivation are on that page). (Also mentioned in the article.)

It works on recorded state. It's not about executing a program again, it's about root causing a failure by going back in time after the failure happened. You can do things like start with a crash, retroactively set a break point and reverse run back in time until you hit it.

It's for real world applications. It was written specifically to debug Firefox, and has since been used for other applications of similar size.

It's basically GDB with extra commands, so very easy to use and learn if you know GDB. Highly recommend.

Time travel debugging generally works on record-and-replay, with there being a lot of research into figuring out what you need to record to get deterministic replay. Recording the results of system calls is a necessary step [1] to get deterministic replay, and that lets you replay even things that rely on modifying external state like database connections.

[1] Necessary, but not sufficient. Multithreaded applications require a lot more care, and rr relies on very accurate hardware counters to get multithreaded executions working correctly (and not all hardware supports these hardware counters!).

Let's say out loud the part that too often goes unstated:

You can rewind & replay a fixed execution of the program.

All the external interactions are recorded, and the replay can only literally replay what already happened.

You cannot change a variable or edit the code and branch off into a different execution path, that talks to the outside world differently; the rest of the world did not get rewound with the program.

> Say 1. Open database connection 2. Commit data 3. Close database connection > > How would you rewind right before #2 if you've already completed step #3? You'd need the socket/connection you already closed

The key thing is that you're really rewinding the world, as visible by the program

The program doesn't actually know what the network transaction with the database was, it just knows what syscalls it made and what results they returned. If, whenever it gets to talking with the database, you provide the same result as last time then it can't tell the database is gone.

This is self-supporting: if you do this consistently with external sources of information then the program will never go down any new code paths, guaranteeing that you'll always have a ready answer recorded when it needs it.

Yeah it's not perfect, but it often works well enough. Just another benefit of isolating code into stateless stuff.

One use case I have is for instance debugging a bug that's hard to trigger. When it finally happens and my breakpoint is hit, I can edit the code, hot swap it live, drop the current frame and then make it call my updated function again as if the original call never happened.

My guess is that you could attempt something like this by recording system calls as your program executes and then replay them in whichever order you need.

You would need a lot of storage space for some programs, but in simple cases that might still be useful.

So, in your example, playing the program back wouldn't really try to read from a closed socket, it would just hit debugger's database of stored system calls at a particular point and retrieve the stored response from that call.

This could get weird though if the program modifies itself as it executes. Not sure what to do in such case, but maybe there's a way to deal with it in special cases, not in general...

http://undo.io and http://rr-project.org both support self-modifying code.

I am a co-founder of undo.io, many of our customers do this.

It's not as bad as it first sounds because the replay of the program will modify itself deterministically. (Though as always with this stuff, there are some gotchas.)

Only the program state gets rewound, not anything external to the process (or whatever is hosting the program). Just like hitting a breakpoint only pauses the program, not the external world.
Yes, that's what it means. Ideally you'd record the state of memory and CPU after each instruction. In practice you can take snapshots at regular intervals and hook system functions to record their inputs and outputs. If a call has to be replayed the debugger intercepts it and gives the debuggee a recorded result.
Exactly. Except there are sources of non-determinism other than syscalls. Namely asynchronous signals, thread ordering, shared memory and non-deterministic instructions. They can all be dealt with though.