| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bonoboTP 260 days ago
	This is not the final target. It's video generation now, but that's just a stepping stone. The real thing is that learning a generator is also learning a prior over videos, and hence over how the world works. The real application of this will be word models, vision-language action models, spatial AI and robotics. Basically a kind of learned simulator in which to plan and imagine possible futures, possible actions and affordances etc. Video models could become a spatial reasoning platform too. A recent paper by deepmind (using veo3) showed that video models can perform many high level vision tasks out of the box. Don't think it's going to end here at some slop feed.

4 comments

gyomu 260 days ago

> This is not the final target

The final target of these "world models" on a 20 year horizon is entirely unmanned factories taking over the economy, and swarm of drones and robots fighting wars and policing citizens.

This is why hundreds of billions are poured into these things, cute Ghibli style videos and vacuum robots wouldn't be worth this much money otherwise.

oceanplexian 260 days ago

What’s so romantic about working in factories? Automation and robotics will accelerate the economy the same way information technology did, and humans will work on better problems than performing repeated tasks on an assembly line or flipping burgers.

There are arguably more jobs today as a result of computers than there were before they were invented. So why is the assumption that AI will magically delete all jobs while discounting the fact that it will create careers we haven’t even thought of?

gyomu 260 days ago

> humans will work on better problems than performing repeated tasks on an assembly line or flipping burgers.

Haha. The current wave of “careers we couldn’t think of” that tech companies have created include being Uber/Doordash/Amazon delivery drivers, data labelers for training AIs, moderator to prevent horrific content spreading on social networks,… with way weaker social benefits & protections than the blue collar jobs of old they replaced.

So yeah, I have a hard time buying this fantasy of everyone doing some magical fulfilling work while AI does all the ugly work, especially when every executive out there is plainly stating that their ideal outcome is replacing 90% of their workforce with AI.

With the way things are headed, AI will take over large economic niches, and humans will fill in at the edges doing the grimy things AI can’t do, with ever diminishing social mobility and safety nets while AI company executives become trillionaires.

vel0city 259 days ago

I actually see robot food delivery services around me, so it might not even be long before those Doordash jobs get replaced by automation. Now I see neighbors starting to get drone deliveries from time to time. Starship used to deliver to the datacenter I used before (it was technically on a college campus but unaffiliated), and I had a coupon for free ice cream delivered through Wing the other day.

https://www.starship.xyz/

https://wing.com/

FrancisMoodie 260 days ago

> So why is the assumption that AI will magically delete all jobs while discounting the fact that it will create careers we haven’t even thought of?

I think that in a vacuum you could reasonably believe that this might be the case but I feel like it isn't just about the technology these days, it's about the hunger c-suites and tech companies have for replacing workforce with ai and/or automation. It's quite clear that layoffs and mass adoption of AI/automation raises shareholder value so there is no incentive to create new jobs.

Will there be an organic shift away from Tech/IT/Computers into new fields? It might, but I think it's a bit naive to think that this will be proportionate to the careers AI will make redundant when there is such a big focus on eliminating as much jobs as possible in lieu of AI.

ipaddr 260 days ago

The hope is that we have no employment and we moved into a different form of society where AI takes care of us and allows us to focus on more spiritual meaningful things.

For now AI is deleting many of the jobs the computer created.

The reality is we will more likely end up in a society where wealth/power at the very top will grow and the masses will be controlled by AI.

listenallyall 259 days ago

more than controlled, enslaved - 24/7 location monitoring (but also no need to ever go anywhere, as everything will be delivered), "perfect" nutrition (fed via IV or tasteless shakes), only "intelligent" conversation taking place between you and an AI agent (even if initially resistant, AI will successfully convince you to drop ties to relatives, friends, that is if ever allowed to make friends), all news delivered via AI-curated channels but is meaningless anyway since AI can create fake video of any leader or important person committing crimes, lying, etc, also all evidence of YOU committing a crime, or just embarrassing stuff like having a sex drive will be recorded and used as blackmail. A "job" to keep you occupied much of the day but your output is never actually needed and discarded by your AI agent "boss".

How is this not entirely obvious to everyone that this is the future? Could be 20, 50, 100 years, but coming for sure.

mallowdram 260 days ago

There are no world models in there, it's trained on arbitrary images/sequences. There are no world models in us, we learn from only specifics in topological space, stitched together in sharp wave ripples. Everything is from detached memories working through optic flow. That's not a world model, it's not even a model. It's an analog. This whole world model thing is another branding phase after language models failed to deliver. After world models it will be neuro symbolic, then RL will sweep in like a final boss fight, and then... it still won't work. Notice anything about these names? They're walking pneumonia paradoxes.

bonoboTP 259 days ago

The point is that video generation is not the goal in itself. Just like classifying photos as cat vs dog wasn't the goal in 2013. I know that Sora 2 is not a world model.

But what's coming is: Vision-language-action models and planning, spatial AI (SLAM with semantics and 3D reconstruction with interactability and affordance detection). Video diffusion models, photo-to-gaussian-splats, video-to-3D (e.g. from Hunyuan), the whole DUSt3R/VGGT line of works, V-JEPA 2 etc. Or if you want product names, Gemini Robotics 1.5, Genie 3, etc. The field is progressing incredibly fast. Humanoid robots are progressing fast. Robotic hands with haptic sensors are more dexterous than ever. It's starting to work. We are only seeing the first glimpses of course.

mallowdram 259 days ago

It's largely irrelevant in terms of intelligence. What you're describing is throwing out 2-D topological integrations (what we do to achieve optic flow ultra fast reaction times in motion), vicarious trial and error, and brute force imposing a machine wax fruit of motion dexterity. It's simply not analog to events the way we experience, it's been cooked up in cog-sci as imitation, but it's not even that. The more we understand the brain's architecture and process, the less relevant this gets, as it's not for legitimate long-term bio ware. There are no world models, the idea is oxymoronic as the topological bypasses this in scale invariance. It's all a dead end this binary, since eventually, analog will rule this with minimal energy and software and use an entirely different software. Think of any arriving too early industry, AI is irrelevant, the first step was reinventing software. It took the least efficient compute principle and drove it to irrelevance using machine vision as an endgame. The lack of redundancies is the tell.

debesyla 259 days ago

I wonder what is this fascination with human shaped robots, if spider shaped robots could be more dexterous and productive.

(Unless it's sci-fi and porn that is mainly pushing for human shaped robots.)

bonoboTP 259 days ago

The built environment fits the human form factor well. Imitation learning and intuitive teleoperation is also easier. But it won't be the only form factor. The quadruped form (like Spot) is also popular, as well as drones etc.

afavour 260 days ago

Sure. But why do I, as a user, want to download Vibes today?

mscbuck 259 days ago

I think generally I agree with you that this is a stepping stone towards bigger/potentially more important things......but that doesn't change the fact that they've packaged it to consumers as something that seems like it has, at best, close to zero utility and at worst has incredible downsides. I'm not sure why releasing this to consumers helps achieve those goals.

bonoboTP 259 days ago

Ad money to recoup the huge investments into datacenters that will do the training of the better models that do the things I mentioned. Meta is working hard on AR, glasses (project Aria), egocentric modeling and spatial AI. At some point they may also pull out the Metaverse idea too, they are still working on avatars too, it's just currently not so popularly hyped.