|
|
|
|
|
by bonoboTP
259 days ago
|
|
The point is that video generation is not the goal in itself. Just like classifying photos as cat vs dog wasn't the goal in 2013. I know that Sora 2 is not a world model. But what's coming is: Vision-language-action models and planning, spatial AI (SLAM with semantics and 3D reconstruction with interactability and affordance detection). Video diffusion models, photo-to-gaussian-splats, video-to-3D (e.g. from Hunyuan), the whole DUSt3R/VGGT line of works, V-JEPA 2 etc. Or if you want product names, Gemini Robotics 1.5, Genie 3, etc. The field is progressing incredibly fast. Humanoid robots are progressing fast. Robotic hands with haptic sensors are more dexterous than ever. It's starting to work. We are only seeing the first glimpses of course. |
|