Yes, you can decode the imagined scenarios into videos and look at them. It's quite helpful during development to see what the model gets right or wrong. See Fig. 3 in the paper: https://www.nature.com/articles/s41586-025-08744-2
So, prediction of future images from a series of images. That makes a lot of sense.
Here's the "full sized" image set.[1] The world model is low-rez images. That makes sense. Ask for too much detail and detail will be invented, which is not helpful.
Here's the "full sized" image set.[1] The world model is low-rez images. That makes sense. Ask for too much detail and detail will be invented, which is not helpful.
[1] https://media.springernature.com/full/springer-static/image/...