I just don't know what a 'ROM game command' is. Seems more likely it would be controller inputs to one person's emulator? Or perhaps everyone has an emulator and the inputs are simply shared.
It’s probably really simple: if you normally load a ROM game and start a multiplayer game you control all the other players input. This simply goes next step and makes other inputs be controlled by other players.
Rollback networking is essentially event sourcing. Game states are immutable, and new game states are derived from adding inputs (events).
You keep the last dozen game states around in memory, and if you receive an input from the past, you rewind to the last game state prior, add it to your input stream, and fast forward to the present.
It has the same base advantages and drawbacks as RTS networking - the core logic is written as though the game is single player, and complexity can be scaled arbitrarily without bloating bandwidth requirements.
But in addition, you get the benefit of zero input latency (play a multiplayer RTS game and send a unit around - they won't move for 200ms or so), and the drawback of an absolute clusterfuck time rewind debugging madness if any inadvertent mutation of your immutable data happens.
The reason you do rollback with something like this is it gives you zero latency, and you can retrofit it on to an emulator without changing any game code just by using memcpy() on the game state.
Source: I've developed about a dozen titles using rollback networking.
I find this hard to conceptualize/unite with the players view of the game - so if an input arrives out of order the engine can essentially just reapply the new adjusted stream of events to correct itself? From a data modelling perspective that seems fine.
However, in those situations what does the player see in game? IIRC rollback was popularised in fighting games like Street Fighter - so does the player see one "universe" only for that branch to suddenly rewind and replay to an alternate universe where a tiny action happened/does not happen?
That's exactly what happens. If you are writing the game yourself, you can do interpolation to fix things up gradually.
You can also delay significant events such as death until the rollback threshold has been passed, so you don't run in to knife edge situations where, e.g., it looks like you died and your character starts to ragdoll but then you snap back when it turns out you killed the enemy instead.
The key to it not being too disruptive is keeping the maximum rollback threshold fairly low. If you add inputs and your ping is greater than the threshold, they get delayed to a later frame, and your inputs start to feel sluggish (the server would enforce the delay, but you'd also add it client side).