| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mallowdram 267 days ago
	There are no world models in there, it's trained on arbitrary images/sequences. There are no world models in us, we learn from only specifics in topological space, stitched together in sharp wave ripples. Everything is from detached memories working through optic flow. That's not a world model, it's not even a model. It's an analog. This whole world model thing is another branding phase after language models failed to deliver. After world models it will be neuro symbolic, then RL will sweep in like a final boss fight, and then... it still won't work. Notice anything about these names? They're walking pneumonia paradoxes.

1 comments

bonoboTP 267 days ago

The point is that video generation is not the goal in itself. Just like classifying photos as cat vs dog wasn't the goal in 2013. I know that Sora 2 is not a world model.

But what's coming is: Vision-language-action models and planning, spatial AI (SLAM with semantics and 3D reconstruction with interactability and affordance detection). Video diffusion models, photo-to-gaussian-splats, video-to-3D (e.g. from Hunyuan), the whole DUSt3R/VGGT line of works, V-JEPA 2 etc. Or if you want product names, Gemini Robotics 1.5, Genie 3, etc. The field is progressing incredibly fast. Humanoid robots are progressing fast. Robotic hands with haptic sensors are more dexterous than ever. It's starting to work. We are only seeing the first glimpses of course.

link

mallowdram 266 days ago

It's largely irrelevant in terms of intelligence. What you're describing is throwing out 2-D topological integrations (what we do to achieve optic flow ultra fast reaction times in motion), vicarious trial and error, and brute force imposing a machine wax fruit of motion dexterity. It's simply not analog to events the way we experience, it's been cooked up in cog-sci as imitation, but it's not even that. The more we understand the brain's architecture and process, the less relevant this gets, as it's not for legitimate long-term bio ware. There are no world models, the idea is oxymoronic as the topological bypasses this in scale invariance. It's all a dead end this binary, since eventually, analog will rule this with minimal energy and software and use an entirely different software. Think of any arriving too early industry, AI is irrelevant, the first step was reinventing software. It took the least efficient compute principle and drove it to irrelevance using machine vision as an endgame. The lack of redundancies is the tell.

link

debesyla 267 days ago

I wonder what is this fascination with human shaped robots, if spider shaped robots could be more dexterous and productive.

(Unless it's sci-fi and porn that is mainly pushing for human shaped robots.)

link

bonoboTP 267 days ago

The built environment fits the human form factor well. Imitation learning and intuitive teleoperation is also easier. But it won't be the only form factor. The quadruped form (like Spot) is also popular, as well as drones etc.

link