|
|
|
|
|
by pixl97
1112 days ago
|
|
Ok, so from your other comment, I think this is where our definition of intelligence is breaking down... Biological agents have a consistent world model based on their capabilities because an inconsistent model would lead to lack of reproduction or death. We could call this environmental intelligence. Meanwhile we have LLMs that have appear to have what I would consider 'micro' world models for some things, but not a large consistent world model. I'm guessing this is due to a few things, but for example not being culled for bad world models would be one, and another is they are only grounded in text and we've not really explored multi-modal grounding in models very far. I guess what's going to be interesting is to see how multi-modal and embodied models do as they are trained in the environment and create a more consistent world model. |
|
I do think multi-modal models will be interesting, but text is a very special sort of thing. It is widely available, semantically rich, and informationally pretty dense. I'm not sure there is such a nice set of properties for other modes. Consider that we have already almost reached training data exhaustion with text and it is, by far, the most voluminous/dense training mode there is.