UndoDB – The interactive time travel debugger for Linux C/C++ for debugging

Y	Hacker News new \| ask \| show \| jobs

	UndoDB – The interactive time travel debugger for Linux C/C++ for debugging (undo.io)
	109 points by droideqa 390 days ago

7 comments

ranger_danger 390 days ago

FOSS alternative: https://rr-project.org/

link

ognarb 390 days ago

What's the difference with RR?

link

mark_undoio 389 days ago

Undo (where I'm CTO) has existed for longer than RR and its real benefit is that it scales to use cases where RR (for one reason or another) isn't a fit.

Technically:

* Doesn't need hardware performance counters - runs on more CPUs and on cloud systems (where performance counters are often blocked).

* Can attach and detach at any time - means you get to record just a subset of program execution that's interesting.

* You can our ship recording tech with your application and control it by API, so you can grab crash recordings on customer systems.

* Supports programs that share memory with non-recorded processes.

* Supports direct device access (e.g. DPDK).

* Accelerated debugging features - searching with recordings using parallel processing, accelerated conditional breakpoints a few thousand times faster than native GDB.

* We provide a stable, patched fork of GDB that we're occasionally told is more stable than the default.

For many people's use cases none of these really matter - they should use RR if they're not already.

But if you need any of these things then Undo can give you time travel debugging. In practice, it's usually big software organisations that we deal with because they have development pain and the extreme requirements we can match.

link

db48x 389 days ago

> * You can our ship recording tech with your application and control it by API, so you can grab crash recordings on customer systems.

That’s actually pretty neat.

link

roca 390 days ago

[rr developer here]

Undo has cool features like Live Recording that we don't have in rr. They don't need access to the hardware PMU which is a big advantage in some situations. They can handle accesses to shared memory in cases where rr can't. https://undo.io/resources/undo-vs-rr/ is a good resource.

link

sidkshatriya 389 days ago

If you don't have access to the hardware PMU then you can try https://github.com/sidkshatriya/rr.soft (which is a modification of the rr debugger).

It may not be commercial quality but its open source and free :)

[I built rr.soft]

link

hugograffiti 388 days ago

Undo also support Java and Scala: https://docs.undo.io/java/index.html

link

leni536 390 days ago

AFAIK it records multithreaded applications on multiple threads and CPU, rr records them on a single OS thread, AFAIK. Not sure about replay. Never used undo though, so not sure how much better it is.

link

dzaima 390 days ago

rr does support multithreaded and multi-process applications, via, like Undo[1], allowing only a single thread to run at a time. (edit note - that's only about multithreading; Undo might have parallel multi-process recording)

[1]: https://undo.io/resources/undo-performance-benchmarks/ - "Undo serializes their execution"

link

leni536 390 days ago

I stand corrected, not sure where I heard this then.

link

dzaima 390 days ago

https://undo.io/resources/undo-vs-rr/ does note parallel recording for multi-process (not multi-threaded), so perhaps that.

link

delta_p_delta_x 390 days ago

Free Windows equivalent: WinDbg Time Travel Debugging (https://learn.microsoft.com/en-gb/windows-hardware/drivers/d...).

link

mark_undoio 389 days ago

WinDbg's time travel debug is really cool and more people should know about it. I'm always a little sad that it's not (so far!) officially integrated in something like VS Code.

Before it was released publicly I believe Microsoft had been using it internally to share recordings on bug reports against massive pieces of software like Office. So it's a serious piece of tech.

link

senderista 385 days ago

I used it (iDNA) on the Windows team starting around 2006 or so and we were able to resolve bugs in minutes that had been open for years. It was absolute magic.

link

minamoto625 389 days ago

gdb should already have a similar feature?

link

mark_undoio 389 days ago

GDB does:

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Proce...

But it's limited. It's really cool that it's integrated by default but it doesn't scale to big applications / workloads.

RR and Undo both use GDB as a user interface, though, so any skills you have there will carry over.

link

ho_schi 389 days ago

A lot people don't know and don't use GDBs reverse debugging. It is an awesome and hidden feature, which more developers should know :)

All these Oh wait. I missed it...debugging sessions. and these What exactly changed over there? are answerable.

link

db48x 389 days ago

It does, but it is really sad by comparison with rr and UndoDB. You could use it to record a few function calls or perhaps if you’re lucky a whole frame of your game but not a whole program.

link

drawnwren 389 days ago

if you could get this working on embedded arm cpus, I think you'd be surprised how many customers there would be.

link

Veserv 389 days ago

Time travel debugging on embedded ARM has been available for over 20 years via trace probes [1].

The category namer of time-travel debugging, TimeMachine, (hence time-travel debugging in contrast to other attempted names such as reversible, bidirectional, record-replay, etc.) was available in 2003 and supports/supported the ARM7 [2]. Note, that is not ARMv7 architecture, that is the ARM7 chip [3] in use from 1993-2001.

From what I know, the ARM7 was one of the first ARM designs implementing the Embedded Trace Macrocell (ETM) which could output the instruction and data trace data used to support trace probe-based time travel debugging.

[1] https://jakob.engbloms.se/archives/1564

[2] https://www.ghs.com/products/probe.html

[3] https://en.m.wikipedia.org/wiki/ARM7

link

mark_undoio 389 days ago

What's limiting us is that Undo does need a Linux kernel - so traditional embedded programming wouldn't be a fit. Embedded Linux could work and we do support ARM64.

I've thought I bit about how you might support time travel on bare metal embedded - but actually there are hardware-assisted solutions (Lauterbach's Trace32 was one we came across) there sometimes.

link

schaefer 390 days ago

Let me save you a click:

Pricing & Licensing

A UDB floating license costs $7,900 per year.

link

db48x 390 days ago

If you’re going to spend money, then you would be better off using rr and paying for Pernosco. Pernosco is amazing.

link

rurban 389 days ago

Thanks for the Pernosco tip. It really looks amazing and you can try it for free as github user

You’re welcome.

rr is awesome and is free and open and all that. How much better could this possibly be?

link

gregthelaw 389 days ago

Undo co-founder here. rr is indeed awesome. If it works for your use-case, you should use it!

Undo is mostly used by companies whose world is complex enough that rr doesn't work for them, and they understand how powerful time travel debugging is.

There has now been a LOT of engineering invested by a lot of very smart people into Undo, so it does also have a lot of polish and nice features.

But honestly, if rr is working for you, that's great. I'm just glad you're not doing printf debugging the whole time :)

link

AlotOfReading 390 days ago

They have a comparison page: https://undo.io/resources/undo-vs-rr/

I was in talks with them recently because I kept running into limitations with rr. The main advantages for my use case were that undo doesn't have the same dependency on hardware timers, which means the ARM support is much better, you can run it in a VM (e.g. a cloud machine) and you can do replays on different systems.

link

dzaima 390 days ago

A couple minor notes:

- If your program is very light on syscalls (i.e. basically entirely in-memory computation), rr can go to a basically 1.0x slowdown. In particular this means you can run benchmarks in it at full capacity, provided that I/O is outside of the repeated part (e.g. if sometimes the bench is noticably slower, you can replay and see if some important loads/stores crossed a cacheline/page). You can even "perf record" / "perf stat" a replay if you want to! (none of this is too useful, but it's fun! Gathering repeated stats over the same execution for more resolution might be useful with proper tooling though)

- rr does have an in-memory buffer of recording data.

- rr recordings should be portable within the architecture, as long as the replay hardware has the extensions the recorder did (or if replayer-unsupported features are disabled at record-time).

link

AlotOfReading 390 days ago

I regularly deal with 3 different architectures. I can go and spin up a cloud instance every time I want to run rr (and in fact that's the solution I've been working with), but it's just annoying enough to justify spending a couple hours in sales calls.

link

Veserv 390 days ago

Well, if you have a Google L5 making ~365k [1] then it would need to make them ~2.2% more productive overall to be worth it when just considering direct pay. If we consider a Google L3 at ~187k then it would need to make them ~4.2% more productive overall.

This, of course, ignores employee benefits and overhead which usually amount to ~100% extra costs over direct pay. So that is now ~1.1% and ~2.1%, respectively.

And that ignores the fact that you need to pay people less than they produce to be profitable which probably drops us down to ~0.5% and ~1.0%, respectively.

[1] https://www.levels.fyi/companies/google/salaries/software-en...

edit: Incorrectly linked to product designer instead of software engineer levels.

link

dima55 390 days ago

OK... Most of us don't know what a "google l5" is, so I guess we can safely ignore this. Heh.

link

eviks 384 days ago

The major fail of such "just a 1% / cup of coffee" is that there is an infinite number of things you could pay for with the same potential productivity promise without any hard data on whether those are true, so just the fact the you can use a calculator and divide to get to a low % doesn't help you much if at all

link

esafak 390 days ago

No-one is going to spend $8K out of pocket to A/B test this on themselves. Of all the things you could be doing to improve your productivity, this is some high hanging fruit.

link

Veserv 390 days ago

If you have a US employer who is unwilling to spend 8 k$ on software engineering productivity then they are pennywise, pound foolish. It literally costs 10x that for a single junior engineer. And, as I pointed out, the net productivity improvement you need to see to justify that expense is miniscule.

If your employer really is skeptical, then they can run a A/B test over a small group of engineers to prove out changes in productivity. But not even being willing to run that test when it is so cheap is just management incompetence.

Engineers are ridiculously expensive. In electrical engineering, where the engineers are generally less well-paid than in software, employers routinely spend multiple hundreds of thousands of dollars per engineer per year in tooling. Not being willing to spend 8 k$ on a test of well known technology and attempting to identify mere single digit percentage improvements is just stupid.

link

ranger_danger 390 days ago

Not everyone is Google. Some people work for themselves, or have very small teams, or live in a developing country, and don't have lots of spare cash laying around.

Please try to understand that the world is not as simple and black and white as you'd like.

link

rixtox 390 days ago

Another more powerful option: Intel SDE / PinPlay

link