| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fxtentacle 259 days ago

LLMs and Diffusion solve a completely different problem than world models.

If you want to predict future text, you use an LLM. If you want to predict future frames in a video, you go with Diffusion. But what both of them lack is object permanence. If a car isn't visible in the input frame, it won't be visible in the output. But in the real world, there are A LOT of things that are invisible (image) or not mentioned but only implied (text) that still strongly affect the future. Every kid knows that when you roll a marble behind your hand, it'll come out on the other side. But LLMs and Diffusion models routinely fail to predict that, as for them the object disappears when it stops being visible.

Based on what I heard from others, world models are considered the missing ingredient for useful robots and self-driving cars. If that's halfway accurate, it would make sense to pour A LOT of money into world models, because they will unlock high-value products.

5 comments

tinco 259 days ago

Sure, if you only consider the model they have no object permanence. However you can just put your model in a loop, and feed the previous frame into the next frame. This is what LLM agent engineers do with their context histories, and it's probably also what the diffusion engineers do with their video models.

Messing with the logic in the loop and combining models has an enormous potential, but it's more engineering than researching, and it's just not the sort of work that LeCun is interested in. I think the conflict lies there, that Facebook is an engineering company, and a possible future of AI lies in AI engineering rather than AI research.

link

Workaccount2 259 days ago

>But what both of them lack is object permanence.

This is something that was true last year, but hanging on by a thread this year. Genie shows this off really well, but it's also in the video models as well.[1]

[1]https://storage.googleapis.com/gdm-deepmind-com-prod-public/...

link

yogrish 259 days ago

I think World models is way to go for Super Intelligence. One of teh patent i saw already going in this direction for Autonomous mobility is https://patents.google.com/patent/EP4379577A1 where synthetic data generation (visualization) is missing step in terms of our human intelligence.

link

makestuff 259 days ago

This is the first time I have heard of world models. Based on my brief reading it does look like this is the idea model for autonomous driving. I wonder if the self driving companies are already using this architecture or something close to it.

link

PxldLtd 259 days ago

I thoroughly disagree, I believe world models will be critical in some aspect for text generation too. A predictive world model you can help to validate your token prediction. Take a look at the Code World Model for example.

link

ml-anon 259 days ago

lol what is this? We already have world models based on diffusion and ar algorithms.

link