Undo (where I'm CTO) has existed for longer than RR and its real benefit is that it scales to use cases where RR (for one reason or another) isn't a fit.
Technically:
* Doesn't need hardware performance counters - runs on more CPUs and on cloud systems (where performance counters are often blocked).
* Can attach and detach at any time - means you get to record just a subset of program execution that's interesting.
* You can our ship recording tech with your application and control it by API, so you can grab crash recordings on customer systems.
* Supports programs that share memory with non-recorded processes.
* Supports direct device access (e.g. DPDK).
* Accelerated debugging features - searching with recordings using parallel processing, accelerated conditional breakpoints a few thousand times faster than native GDB.
* We provide a stable, patched fork of GDB that we're occasionally told is more stable than the default.
For many people's use cases none of these really matter - they should use RR if they're not already.
But if you need any of these things then Undo can give you time travel debugging. In practice, it's usually big software organisations that we deal with because they have development pain and the extreme requirements we can match.
Undo has cool features like Live Recording that we don't have in rr.
They don't need access to the hardware PMU which is a big advantage in some situations.
They can handle accesses to shared memory in cases where rr can't.
https://undo.io/resources/undo-vs-rr/ is a good resource.
AFAIK it records multithreaded applications on multiple threads and CPU, rr records them on a single OS thread, AFAIK. Not sure about replay. Never used undo though, so not sure how much better it is.
rr does support multithreaded and multi-process applications, via, like Undo[1], allowing only a single thread to run at a time. (edit note - that's only about multithreading; Undo might have parallel multi-process recording)
WinDbg's time travel debug is really cool and more people should know about it. I'm always a little sad that it's not (so far!) officially integrated in something like VS Code.
Before it was released publicly I believe Microsoft had been using it internally to share recordings on bug reports against massive pieces of software like Office. So it's a serious piece of tech.
I used it (iDNA) on the Windows team starting around 2006 or so and we were able to resolve bugs in minutes that had been open for years. It was absolute magic.
It does, but it is really sad by comparison with rr and UndoDB. You could use it to record a few function calls or perhaps if you’re lucky a whole frame of your game but not a whole program.
Time travel debugging on embedded ARM has been available for over 20 years via trace probes [1].
The category namer of time-travel debugging, TimeMachine, (hence time-travel debugging in contrast to other attempted names such as reversible, bidirectional, record-replay, etc.) was available in 2003 and supports/supported the ARM7 [2]. Note, that is not ARMv7 architecture, that is the ARM7 chip [3] in use from 1993-2001.
From what I know, the ARM7 was one of the first ARM designs implementing the Embedded Trace Macrocell (ETM) which could output the instruction and data trace data used to support trace probe-based time travel debugging.
What's limiting us is that Undo does need a Linux kernel - so traditional embedded programming wouldn't be a fit. Embedded Linux could work and we do support ARM64.
I've thought I bit about how you might support time travel on bare metal embedded - but actually there are hardware-assisted solutions (Lauterbach's Trace32 was one we came across) there sometimes.
Undo co-founder here. rr is indeed awesome. If it works for your use-case, you should use it!
Undo is mostly used by companies whose world is complex enough that rr doesn't work for them, and they understand how powerful time travel debugging is.
There has now been a LOT of engineering invested by a lot of very smart people into Undo, so it does also have a lot of polish and nice features.
But honestly, if rr is working for you, that's great. I'm just glad you're not doing printf debugging the whole time :)
I was in talks with them recently because I kept running into limitations with rr. The main advantages for my use case were that undo doesn't have the same dependency on hardware timers, which means the ARM support is much better, you can run it in a VM (e.g. a cloud machine) and you can do replays on different systems.
- If your program is very light on syscalls (i.e. basically entirely in-memory computation), rr can go to a basically 1.0x slowdown. In particular this means you can run benchmarks in it at full capacity, provided that I/O is outside of the repeated part (e.g. if sometimes the bench is noticably slower, you can replay and see if some important loads/stores crossed a cacheline/page). You can even "perf record" / "perf stat" a replay if you want to! (none of this is too useful, but it's fun! Gathering repeated stats over the same execution for more resolution might be useful with proper tooling though)
- rr does have an in-memory buffer of recording data.
- rr recordings should be portable within the architecture, as long as the replay hardware has the extensions the recorder did (or if replayer-unsupported features are disabled at record-time).
I regularly deal with 3 different architectures. I can go and spin up a cloud instance every time I want to run rr (and in fact that's the solution I've been working with), but it's just annoying enough to justify spending a couple hours in sales calls.
Well, if you have a Google L5 making ~365k [1] then it would need to make them ~2.2% more productive overall to be worth it when just considering direct pay. If we consider a Google L3 at ~187k then it would need to make them ~4.2% more productive overall.
This, of course, ignores employee benefits and overhead which usually amount to ~100% extra costs over direct pay. So that is now ~1.1% and ~2.1%, respectively.
And that ignores the fact that you need to pay people less than they produce to be profitable which probably drops us down to ~0.5% and ~1.0%, respectively.
The major fail of such "just a 1% / cup of coffee" is that there is an infinite number of things you could pay for with the same potential productivity promise without any hard data on whether those are true, so just the fact the you can use a calculator and divide to get to a low % doesn't help you much if at all
No-one is going to spend $8K out of pocket to A/B test this on themselves. Of all the things you could be doing to improve your productivity, this is some high hanging fruit.
If you have a US employer who is unwilling to spend 8 k$ on software engineering productivity then they are pennywise, pound foolish. It literally costs 10x that for a single junior engineer. And, as I pointed out, the net productivity improvement you need to see to justify that expense is miniscule.
If your employer really is skeptical, then they can run a A/B test over a small group of engineers to prove out changes in productivity. But not even being willing to run that test when it is so cheap is just management incompetence.
Engineers are ridiculously expensive. In electrical engineering, where the engineers are generally less well-paid than in software, employers routinely spend multiple hundreds of thousands of dollars per engineer per year in tooling. Not being willing to spend 8 k$ on a test of well known technology and attempting to identify mere single digit percentage improvements is just stupid.
Not everyone is Google. Some people work for themselves, or have very small teams, or live in a developing country, and don't have lots of spare cash laying around.
Please try to understand that the world is not as simple and black and white as you'd like.