Show HN: Retrace – reverse debugging for production CPython applications

Y	Hacker News new \| ask \| show \| jobs

Show HN: Retrace – reverse debugging for production CPython applications (github.com)

14 points by L15p3r 39 days ago

Nathan here, one of the people who built Retrace. Happy to answer technical questions. Retrace records a CPython application's interactions with the nondeterministic outside world i.e. network, DB, filesystem, time, randomness, subprocesses, and lets you replay that execution locally and deterministically. The goal is to take a production failure and open the same execution in VS Code, with the ability to step forwards and backwards through the replay.

The core idea is to record boundary crossings rather than tracing every Python line in production. External calls are recorded as calls/results/errors, and replay stubs return the recorded results so the original application code runs again deterministically.

The preview today covers Python 3.11/3.12 on macOS and Linux, with Flask, Django, requests, psycopg2, and threading/forking covered. There is a compatibility table in the README.

This is a preview, not a finished product. Things we know are missing: async support is partial, FastAPI is not in the table yet, Windows is not supported, and free-threaded 3.13 is detected and refused.

Happy to go deep on: - how we get determinism on real Python stacks (threads, async, third-party libraries, C extensions) - recording overhead, what it depends on, and what we have actually benchmarked vs claimed - what works and what does not yet - how this differs from rr, Replay.io, pdb time-travel forks, and APM tools

Blog post (longer write-up): https://retracesoftware.com/blog/introducing-retrace/

3 comments

michaelsalim 39 days ago

Congrats to the team for the launch! I helped build a part of this in the past.

The repo is complex but at its core, this is software to record execution without the performance & storage penalty that would usually come with recording all of production.

To do that, they need to make sure that they record anything this is not deterministic, while leaving code that is deterministic to be executed during replay time.

To be honest, I think this is a really hard problem, almost impossible I'd say. There's just so many things that can cause the same execution to cause different results. But last I've seen, the team is slowly squashing each of the edge cases. I think they've now gotten it to be quite stable.

If everything goes well, this is very exciting and I think can revolutionise how we debug production code as an industry. I unfortunately don't run Python code so I can't meaningfully test this. Here's hoping it takes off and one day it'll be ported to the languages I use!

link

thedavidprice 39 days ago

I (we) have seen these pieces before: replay, time-travel debugging, tracing, APM, event sourcing, etc. What I haven’t seen is the whole thing work as described in a production setting. I.e. capture enough of a production run that you can replay it locally, get the same inputs/results, and then trace a bad value back to where it came from.

Has anyone seen prior tech that got this whole combination viably working? Or is there true (potential) novelty here in the combination of production replay + value provenance + usable workflow?

link

alzamos 39 days ago

Do I understand correctly that this would enable me to do retroactive logging/perf-instrumentation?

link

L15p3r 38 days ago

Hi thanks for your interest in Retrace. Currently the way Retrace works means that if you change the code, the replay will likely diverge. This extends to logging statements. The ability to adapt code to allow for what-if reasoning is somethign we're looking at but not currently supported. But I would ask why would you want to? Retrace allows full forward/backwards debugging of execution runs. With this capability what would adding log statements to the replay give you? In terms of performance instrumentation, with no breakpoints the replay will ruin at full speed with no IO wait. You could profile the replay and get a really accurate idea of where the CPU bottlenecks are. Hope this helps answer your questions.

link