World models will be how general purpose robots finally work. They are essentially learned simulators of the world. They will replace traditional robotics simulators which are not flexible enough to enable training of general robotics policies. Robot control policies will be trained and evaluated in learned simulators, and the policies themselves will also be world models in order to predict the consequences of their own actions and thus enable planning. Simulated data will scale much better than expensive real-world robot data, and will allow robot policies to reach LLM-level dataset sizes, and subsequently, LLM-level performance.
By enabling general purpose robotics, world models will be one of the most useful inventions of all time. For examples of what I'm talking about in current research, check:
The world model is useful for planning. It can "anticipate" consequences of actions. This can be used for a kind of tree search to decide on optimal actions in robotics
Right now there is (AFAIK) no world model product booking any meaningful revenue. So there's a decent chance WMs turn out to have no long-term utility at all.
However, there are a few promising markets, assuming WMs continue to get better and cheaper:
1. Robotics training / evaluation: modern end-to-end (sensors-to-control) robot policies require simulators that are almost indistinguishable from reality. If your sim is distinguishable from reality, the evaluation metrics you get from sim don't mean anything and the policies you train in sim don't work. World models will likely be the highest-fidelity robotics simulators, since WMs are data-driven and get arbitrarily more-realistic given more data/compute. This is why so many robotics companies have WM projects [1] [2] [3] [4].
2. Video frontends for agents: in the same way that today's frontier labs are building realtime voice interfaces [5] which behave like a phone call, realtime video interfaces will behave like a video call. Early forms of this don't feel compelling IMO [6] [7], but once the models can instantly blend between rendering the agent itself, drawing diagrams/visualizations, rendering video, etc. I can see it surpassing pure voice mode.
3. Entertainment: zero-shot world generation (i.e. holodeck, genie 3; paste in an image/video/text prompt and get a world) will be a fun toy but I'm not convinced it has any long-term value. I'm more optimistic about proper narrative experiences where each scene/level is a small, carefully-crafted world (behaving like a normal film scene if you don't touch the controls, and an uncharted/TLoU-style narrative game if you do), such that the sequence of scenes builds up a larger story.
Games. Build campaigns in hours instead of months. Make it possible for users to create their own campaigns, move the action to different game worlds - 'gimme Mario Kart in the ${favourite_game} world', etc.
Yeah, but is this really that great? Are these models going to remember the town you wandered through on your session yesterday and want to return to?
Imagine playing Read Dead Redemption 2 and you attempt to ride your horse from Saint Denis to Valentine and Valentine no longer exists, or is a completely different town located half a mile off from where it was originally.
If I had to use the models as they exist right now I'd use them in a procedural Myst-like where I incorporate the temporal inconsistency into the setting. The player's actions and state would affect the prompts used for conditioning the video generation. It would probably be weird and buggy but could be fun.
You could also use these models to generate assets for a game during development whether that's simple cutscenes or assets produced through gaussian splatting or some other process.
If these models and others can be run cost effectively on a cloud service or even locally at some point then you could do some interesting things by combining them with 3D mesh generation, img2img, vid2vid, etc. just think about even simple games like Papers Please and the whole genre it spawned that uses short episodes where you have to make a guess based on what you see, there's a lot of potential for creating new mechanics around generative imagery.
I don't think this model specifically has any direct real world usage. It's more for "creating a biome", for writers, inspiration etc. Perhaps even for some rudimentary visualization - more like an extended sketch than anything directly ready to use.
Same prompt, same seed, and yes you can ensure you get the same output, but also imagine using it as a game designer and recording the output. Imagine level editors where you prompt to fill in details, walk through it, decide which parts you don't like, and prompt for a replacement of those parts.
Yes, a lot of models don’t state this explicitly, but they can be made deterministic. Not the generation itself, but the same prompt, with a generation seed will always result in the same output.
It is inevitable that learned simulators will replace hand-coded simulators, as it is a straightforward application of the Bitter Lesson: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
By enabling general purpose robotics, world models will be one of the most useful inventions of all time. For examples of what I'm talking about in current research, check:
Dreamer 4: https://danijar.com/project/dreamer4/
DreamDojo: https://arxiv.org/abs/2602.06949
Tesla's world model: https://www.youtube.com/watch?v=LFh9GAzHg1c
Waymo's world model: https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-f...