| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pixl97 1112 days ago

Ok, so from your other comment, I think this is where our definition of intelligence is breaking down...

Biological agents have a consistent world model based on their capabilities because an inconsistent model would lead to lack of reproduction or death. We could call this environmental intelligence.

Meanwhile we have LLMs that have appear to have what I would consider 'micro' world models for some things, but not a large consistent world model. I'm guessing this is due to a few things, but for example not being culled for bad world models would be one, and another is they are only grounded in text and we've not really explored multi-modal grounding in models very far.

I guess what's going to be interesting is to see how multi-modal and embodied models do as they are trained in the environment and create a more consistent world model.

1 comments

nathan_compton 1112 days ago

I believe that the best way to understand these large language models is that they have models of patterns of text. To the extent that patterns of text are congruent with patterns in the world, they appear to function well, but I think, in the end, they are statistical models of text, not of the world, and that substantially limits their capabilities.

I do think multi-modal models will be interesting, but text is a very special sort of thing. It is widely available, semantically rich, and informationally pretty dense. I'm not sure there is such a nice set of properties for other modes. Consider that we have already almost reached training data exhaustion with text and it is, by far, the most voluminous/dense training mode there is.