Hacker News new | ask | show | jobs
by visarga 2428 days ago
It's not just grounding the language in vision, but the embodiment, first person perspective and ability to interact with the environment. Humans have had the benefit of slowly evolving in a complex environment which is too expensive to recreate for artificial agents. We can only create very limited sims vs the real world.