|
Not a robotics guy, but to extent that the same fundamentals hold— I think it's a degrees of freedom question. Given the (relatively) low conditional entropy of natural language, there aren't actually that many degrees of (true) freedom. On the other hand, in the real world, there are massively more degrees of freedom both in general (3 dimensions, 6 degrees of movement per joint, M joints, continuous vs. discrete space, etc.) and also given the path dependence of actions, the non-standardized nature of actuators, actuators, kinematics, etc. All in, you get crushed by the curse of dimensionality. Given N degrees of true freedom, you need O(exp(N)) data points to achieve the same performance. Folks do a bunch of clever things to address that dimensionality explosion, but I think the overly reductionist point still stands: although the real world is theoretically verifiable (and theoretically could produce infinite data), in practice we currently have exponentially less real-world data for an exponentially harder problem. Real roboticists should chime in... |
Even the existence of most relationships in the physical world can only be inferred, never mind dimensionality. The correlations are often weak unless you are able to work with data sets that far exceed the entire corpus of all human text, and sometimes not even then. Language has relatively unambiguous structure that simply isn't the norm in real space-time data models. In some cases we can't unambiguously resolve causality and temporal ordering in the physical world. Human brains aren't fussed by this.
There is a powerful litmus test for things "AI" can do. Theoretically, indexing and learning are equivalent problems. There are many practical data models for which no scalable indexing algorithm exists in literature. This has an almost perfect overlap with data models that current AI tech is demonstrably incapable of learning. A company with novel AI tech that can learn a hard data model can demonstrate a zero-knowledge proof of capability by qualitatively improving indexing performance of said data models at scale.
Synthetic "world models" so thoroughly nerf the computer science problem that they won't translate to anything real.