Hacker News new | ask | show | jobs
by SeanAnderson 841 days ago
What's the performance of this like? It seems really appealing. I would love to be able to use it to debug https://github.com/MeoMix/symbiants because I use RNG heavily to add variance to the world and that, combined with indeterminate execution order of systems, can really leave me scratching my head sometimes.

However, I'm building using a tilemap that's 144x144. So I've got ~21000 entities to log. It seems impractical to snapshot the world every tick, but maybe if it were able to snapshot deltas or something?

2 comments

Revy already works with snapshot deltas (see other comments scattered around this section for more details, but basically we only sync components that changed during the previous frame -- Rerun stitches everything back together at runtime)... but at 21k entities, I'm afraid you'll be facing much bigger issues on the Rerun-side of things :D

Rerun was originally designed for few (i.e. dozens up to hundreds) massive entities (e.g. it's common for a single entity to have a few million 3D points and color values attached to it).

While we're slowly working towards improving the many-entities use-case, the correct thing to do in this case would probably be for Revy to identify that all these entities are really just different instances of the same batch (either automatically, or by exposing a marker component or something).

So, say, you'd set a marker component on all your tiles, Revy would then snapshot them as a single batch of 144^2 instances, and then in Rerun you'd see a single entity `/tiles` which would be a batch of 144^2 instances (each with their own set of components, that's fine!). From Rerun's point-of-view, this would be similar to a point cloud, and at 21k instances you'd be easily running at your monitor refresh rate with a lot of margin.

But by any means, try it! Not the web version though, you're definitely going to need multithreading :D

Nice project btw; I'll keep an eye on it and probably use it as a benchmark for the many-entities use-case!

Thanks for the response! :) Great to hear deltas work. Yeah, sounds like it's the sort of thing that would need to run natively until multithreading is supported in the web.
Is a generic time travel debugging solution too much overhead? A good multithreaded time travel debugger (not deterministic replay based) should only incur ~100% overhead in the memory bandwidth bound case. If you are not saturating your memory bus without instrumentation then the overhead should be proportionally less.
Nah, that'd probably work, too. I think the key here is multithreading. I do most of my development in a WASM context where Bevy doesn't support multithreading yet. I switch to native debugging when I want breakpoints (or in this case, when I'd want multithreading).

It's not the greatest workflow to default to WASM, but it makes it easier to treat web as a first-class development target. Still not sure that's worthwhile overall, but giving it a shot for now.

Wait, which is the hard one you wish you had time travel debugging on, the single threaded WASM context or the multithreaded native context?

The multithreaded native context is the one that is harder in principle, but should only incur ~100% overhead for any program including ones not using Bevy. Though I do not know about the general availability of these products in your field.

A single-threaded context is vastly simpler and can be done with similar overhead without platform support or ~1-10% overhead with platform support. Though I do not know is anybody has implemented efficient WASM support or if anybody with efficient multithreading implementations has ported to WASM.

Likely the only available ones are the inefficient 1,000% overhead or the hilariously bad 100,000% overhead ones like the default gdb implementation. To be fair, these implementations are much easier to write. Even ~100% overhead in the single-threaded case is more common amongst extant solutions since getting down to ~10% requires some serious optimization. Still should be perfectly adequate for development work.

Sounds like you know a lot more about this area than I :)

I would like an efficient way of time travelling in a single threaded context.

As you describe it, it makes sense that supporting multithreading would make the problem space much more challenging to navigate. I wasn't thinking about that, but it's clear once you point it out. I was just considering the overhead of maintaining the undo state without being able to delegate it to a separate thread.

As OP mentions, they use change detection to calculate/store deltas, but Bevy's ECS change detection isn't very performant. You still have to iterate over all components and check a component's value to learn changed state rather than being able to filter on a `Changed` archetype. It kind of makes sense, though, because adding/removing Changed components from tons of entities every tick would also be expensive. Either way, change detection feels like a sore spot when working with tons of entities in ECS. I'm not super confident there's a way around that without manually maintaining some data structures outside of the ECS paradigm, but was thinking that if I could at least run the change detection on a separate thread that it might be tolerable.

If you are okay with single-threaded Linux native as a debug platform (i.e. you have a build that you reproduce bugs on) then you can probably use rr. undo.io has also been in the field for a long time. I hear they can also do multithreaded Linux native in some capacity as well. One of the people from undo frequently pops into time travel debugging threads when they appear, so they could give you more info if they drop by.

If you are on Windows, Microsoft has some form of time travel debugging, but I am pretty sure they do a instrumented emulator which is a 10-20x slowdown approach. I do not know of anything else on Windows.

The only efficient multithreaded time travel debugging I am aware of is all in the embedded field, so unlikely to be applicable. Most of the “multithreading” solutions otherwise available work by serializing your execution to a single thread, so they do not really count. Maybe there is something else out there, but not really sure.

rr and Undo are about the same here: they support multiple threads but run all threads on a single core.