Hacker News new | ask | show | jobs
by bhackett 1744 days ago
(Replay employee)

1. Rather than having to restore state to the point at the previous step, we can step backwards by replaying a separate process to the point before the step, and looking at the state there (this post talks about how that works: https://medium.com/replay-io/inspecting-runtimes-caeca007a4b...). Because everything is deterministic it doesn't matter if we step around 10 times and use 10 different processes to look at the state at those points.

2. We record the calls made by the browser, though it is the calls into the system libraries rather than the syscalls themselves (the syscall interfaces aren't stable/documented on mac or windows).

3. Maintaining ordering like this isn't normally necessary for ensuring that behavior is the same when replaying. In the case of memory locations, the access made by thread 2 to location B will behave the same regardless of accesses made by thread 1 to location A, because the values stored in locations A and B are independent from one another.

1 comments

Thanks for the explanation! Do you ever run into performance issues with replaying from the start on each backward step or is this not really in issue in practice? I imagine for most websites and short replays it's probably fine, but for something like a game with a physics engine it sounds like it would be too expensive and you'd need snapshots or something. I guess that's a super small percentage of the market though.

For question 3 on the ordering, I was imagining the following kind of scenario: one thread maybe calls a system library function to read a cursor position and another calls a system library function to write a cursor position. So even though they're separate functions, they interact with the same state. Do you require users to manually call to the recorder library to give the recorder runtime extra info in this kind of scenario? Sorry if this is a dumb question, I haven't really done any programming at this level.

We definitely need to avoid replaying from the start every time we want to inspect the state at some point. This is kind of an internal detail, but we can avoid having to replay parts of the recording over and over again by using fork() to create new processes at points within the recording.

Ordering constraints between different library functions do crop up from time to time. In cases like this the recorder library uses ordered locks internally (basically emulating the synchronization which the system library has to do) to ensure that the calls execute in the expected order when replaying.

Oh that's cool, using fork() to create checkpoints. Thank you again for taking the time to explain!