|
|
|
|
|
by nl
173 days ago
|
|
The reason they are called "world models" is because the internal representation of what they display represents a "world" instead of a video frame or image. The model needs to "understand" geometry and physics to output a video. Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful. |
|
I'm unconvinced. The tiger and girl video is the clearest example. Nothing about that seems world representing