I was wondering how feasible it would be to make a Minecraft agent that had a running feed of the past few seconds, continued it off w/ SORA, fed the continuation into a (relatively) simple policy translator that just pulled out what the video showed as player inputs, and the inputted that.
Presumably, this would work for non-minecraft applications, but Minecraft has a really standardized interface layer
It’s interesting to think about what the internal model of a NN of a game might be like, and whether it could be exposed with an API that someone could program against.
Presumably, this would work for non-minecraft applications, but Minecraft has a really standardized interface layer