| HN Mirror

Yes, the hack they used was for the exploration part: providing a state summary to explicitly decide if a state was new or not, and, in the initial Go-Explore, essentially letting the agent teleport to arbitrary states to begin exploring from there.

However, once the exploring was done, they could train an agent on the trajectories of the exploring agent to solve MR with no problem. That's why I say that MR is an exploration problem and training on demonstrations from a player which has already solved MR would obviously work - because it does. So it doesn't show anything interesting about Gato, because Gato would be solving the part of MR that everyone is agreed is basically trivially easy.