Hacker News new | ask | show | jobs
by maeln 254 days ago
> Current batch of deep learning models are fundamentally a technology for labor automation. This is immensely useful in itself, without the need to do AGI. The Sora2 capabilities are absolutely wild (see a great example here of what non-professional users are already able to create with it: https://www.youtube.com/watch?v=HXp8_w3XzgU )

> So only looking at video capabilities, or at coding capabilities, it's already ready to automate and upend industries worth trillions in the long run.

Can Sora2 change the framing of a picture without changing the global scene ? Can it change the temperature of a specific light source ? Can it generate a 8k HDR footage suitable for re-framing and color grading ? Can it generate minute long video without loosing coherence ? Actually, can it generate a few seconds without having to reloop with the last frame and have these obnoxious cuts that the video you pointed has ? Can it reshoot the same exact scene with just one element altered ?

All the video models right now are only good at making short, low-res, barely post-processable video. The kind of stuff you see on social media. And considering the metrics on ai-generated video on social media right now, for the most part, nobody want to look at them. They might replace the bottom of the barrel of social media posting (hello cute puppy videos), but there is absolutely nothing indicating that they migth automate or upend any real industry (be used in the pipeline, yeah maybe, why not, automate ? Won't hold my breath).

And the argument of their future capabilities, well ... It's been 50+ years that we should have fusion in 20 years.

Btw, the same argument can be made for LLM and image-gen tech in any creative purposes. People severly underestimate just how much editing, re-work, purpose and pre-production steps are involved in any major creative endeavor. Most model are just severly ill suited for that work. They can be useful for some stuff (specificaly, for editing images, ai-driven image fill do work decently for exemple), but overall, as of right now, they are mostly good at making low quality content. Which is fine I guess, there is a market for it, but it was already a market that was not keen on spending money.

2 comments

This is very surface level criticism.

Qwen image and nano banana can both do that with images, there’s zero reason to think we can’t train video models for masking.

This feels a lot like critiquing stable diffusion over hands and text, which the new SOTA models all handle well.

One of the easiest iterations on these models is to add more training cases to the benchmarks. That’s a timeline of months, not comparable to forecasting progress over 20 years like fusion.

> This is very surface level criticism.

Is it now. I don't think being able to accurately and predictably make changes to a shot, a draft, a design is surface level in production.

> Qwen image and nano banana can both do that with images, there’s zero reason to think we can’t train video models for masking.

Tell them to change the tilt of the camera roughly 15 degree left without changing anything else in the scene and tell me if it works.

> This feels a lot like critiquing stable diffusion over hands and text, which the new SOTA models all handle well.

Well does a lot of heavy lifting there.

> One of the easiest iterations on these models is to add more training cases to the benchmarks. That’s a timeline of months, not comparable to forecasting progress over 20 years like fusion.

And what if the model itself is the limiting factor ? The entire tech ? Do we have any proof that in the future the current technologies might be able to handle the cases I spoke about ?

Also, one thing that I didn't mention in the first post. Assuming that the tech does come to the point I can be used to automate a lot of the production. If Throwing a few millions to buy a GPU cluster is enough to be able to "generate" a relatively high quality movie or series, the barrier to entry will be incredibly low. The cost will be driven down, the amount of production will be very high and overall it might not be a trillion dollar industry no more.

> They might replace the bottom of the barrel of social media posting (hello cute puppy videos)

Lay off. Only respite I get from this hell world is cute Rottweiler videos