| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sigmoid10 359 days ago
	Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.

1 comments

mym1990 359 days ago

What is the basis for it having a reasonable understanding of fluid dynamics? Why don’t you think it’s just regurgitating some water scenes derived from its training data, rather than generating actual fluid dynamics?

link

sigmoid10 358 days ago

Because it can actually extrapolate to unseen cases while maintaining realism.

link

mym1990 358 days ago

Ah yes, the classic “because it can” argument. I’ll take that to mean you don’t know what you’re talking about.

link

sigmoid10 352 days ago

It seems you are confusing this with a personal opinion. This is not my opinion. This is merely the consensus of current research.

See here for example:

[1] https://arxiv.org/pdf/2410.18072

[2] https://arxiv.org/pdf/2411.02914v1

[3] https://openai.com/index/video-generation-models-as-world-si...

But even if you knew nothing about this topic, the observation that you simply couldn't store the necessary amount of video data in a model such that it could simply regurgitate it should give you a big clue as to what is happening.

link