Hacker News new | ask | show | jobs
by roca 2786 days ago
You should read https://arxiv.org/abs/1705.05937 so you don't need to speculate. rr absolutely does guarantee that threads are scheduled the same way during replay as during recording, otherwise it wouldn't work at all on applications like Firefox which use a lot of threads.

Also, rr definitely is very useful for debugging race conditions. For example Mozilla developers have debugged lots of race conditions using it. One thing that really helps is rr's "chaos mode", which randomizes thread scheduling in an intelligent way to discover possible races. See https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo... and https://robert.ocallahan.org/2016/02/deeper-into-chaos.html and https://robert.ocallahan.org/2018/05/rr-chaos-mode-improveme....

1 comments

Very cool stuff! And yes, I took a look at the paper, as I noted in my edit. But I think there's still two classes of race conditions outside of its scope: ones that require simultaneous execution (where you can get surprising interleavings) and lock-free algorithms where correct use of the memory model is paramount. In my personal experience, these are the hardest problems to debug.
Even those are probably not 100% outside of its scope. I forget the details of chaos mode, but that kind of induced thread-switching can cause just the kind of interleaving you seem to be talking about.

What rr cannot capture is a very small subclass of race conditions involving things like cache line misses - I think that's what you're alluding to by "correct use of the memory model is paramount" but it's a subclass even of those. Yes, those are hugely difficult to diagnose and it would be fantastic if tools like rr or UndoDB could capture them. But there's a vast swathe of also very difficult race conditions that this recording tech can and does help with today.