Hacker News new | ask | show | jobs
by sigmoid10 359 days ago
Veo3's world model is still pretty limited. That becomes obvious very fast once you prompt out of distribution video content (i.e. stuff that you are unlikely to find on youtube). It's extremely good at creating photorealistic surfaces and lighting. It even has some reasonably solid understanding of fluid dynamics for simulating water. But for complex human behaviour (in particular certain motions) it simply lacks the training data. Although that's not really a fault of the model and I'm pretty sure there will be a way to overcome this as well. Maybe some kind of physics based simulation as supplement training data.
1 comments

What is the basis for it having a reasonable understanding of fluid dynamics? Why don’t you think it’s just regurgitating some water scenes derived from its training data, rather than generating actual fluid dynamics?
Because it can actually extrapolate to unseen cases while maintaining realism.
Ah yes, the classic “because it can” argument. I’ll take that to mean you don’t know what you’re talking about.
It seems you are confusing this with a personal opinion. This is not my opinion. This is merely the consensus of current research.

See here for example:

[1] https://arxiv.org/pdf/2410.18072

[2] https://arxiv.org/pdf/2411.02914v1

[3] https://openai.com/index/video-generation-models-as-world-si...

But even if you knew nothing about this topic, the observation that you simply couldn't store the necessary amount of video data in a model such that it could simply regurgitate it should give you a big clue as to what is happening.