| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nl 173 days ago
	The reason they are called "world models" is because the internal representation of what they display represents a "world" instead of a video frame or image. The model needs to "understand" geometry and physics to output a video. Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful.

3 comments

godelski 173 days ago

  > what they display represents a "world" instead of a video frame or image.

Do they?

I'm unconvinced. The tiger and girl video is the clearest example. Nothing about that seems world representing

link

PunchyHamster 173 days ago

I think the reason is "those words look nice on promo material". It is absolutely build to trigger hype from the clueless

link

slashdave 173 days ago

> The model needs to "understand" geometry and physics to output a video.

No it doesn't. It merely needs to mimic.

link

IAmGraydon 173 days ago

Correct. The fact that AI is a black box means we can easily imagine anything we want happening within that box. Or perhaps the more accurate way to say it - AI companies can convince investors of amazing magic happening within that box. With LLMs, we anthropomorphize and imagine it’s thinking. With video models, they’re now trying to convince us that it understands the world. None of these things are true. It’s all an illusion.

link

slashdave 173 days ago

It's worse than that. It's not a black box. We know how the architecture is constructed. We can read the weights.

link

in-silico 172 days ago

Here's a recent paper showing that models trained to generate videos develop strong geometric representations and understanding:

https://arxiv.org/abs/2512.19949

link