Hacker News new | ask | show | jobs
by robotresearcher 562 days ago
> Long term, you'll never have a coherent movie produced by stringing together a series of textual snippets because, again, that's just impossible.

Why snippets? Submit a whole script the way a writer delivers a movie to a director. The (automated) director/DP/editor could maintain internal visual coherence, while the script drives the story coherence.

5 comments

This almost certainly won’t work. Feel free to feed any of the hundreds of existing film scripts and test how coherent the models can be. My guess is not at all
The clips on the Sora site today would have been utterly astonishing ten years ago. Long term progress can be surprising.
> The clips on the Sora site today would have been utterly astonishing ten years ago.

Yeah, and Apollo 11 would have been utterly astonishing a decade before it occurred. And, yet, if you tried to project out from it to what further frontiers manned spaceflight would reach in the following decades, you’d…probably grossly overestimate what actually occurred.

> Long term progress can be surprising.

Sure, it can be surprising for optimists as well as naysayers; as a good rule of thumb, every curve that looks exponential in an early phase ends up being at best logistic.

In the long run we are all dead. Saying that technology will be better in the future is almost eye-roll worthy. The real task is predicting what future technology will be, and when it will arrive.

Ask anyone with a chronic illness about the future and they'll tell you we're about 5 years off a cure. They've been saying that for decades. Who knows where the future advancements will be.

This will almost certainly be in theaters within 5 years, probably first as a small experimental project (think blair witch).
The Blair Witch Project was a (surprise) creative masterpiece. It worked with very limited technology to create a very clever plot which was paired with an amazing marketing. The combination of which the world hadn’t seen before. It took some creative geniuses to peace the Blair Witch Project together.

Generative AI will never produce an experience like that. I know never is a long time, but I’m still gonna call it. You simply can’t produce such a fresh idea by gathering a bunch of data and interpolating.

Maybe someday enough AI will be good enough to create shorter or longer videos with some dialog and even a coherent story (though I doubt it), but it won‘t be fresh or creative. And we humans will at best enjoy it for its stupidity or sloppiness. Not for its cleverness or artistry.

Why does the idea need to be generated by AI? Let people generate the ideas, the AI will help execute. I think soon (3-5 years) a determined person with no video skills will be able to put together a compelling movie (maybe a short). And that is massive. AI doesn’t have to do everything. Like all tech, it’s a productivity tool.
> Why does the idea need to be generated by AI?

This is the at-first-fun-but-now-frustrating infinite goal move. "AI (a stand in for literally anything) will do (anything) soon." -> "It won't do (thing), it's too complex." -> "Who said AI will do (thing)?"

AI will self-drive cars in San Francisco
I'm suspicious of most claims of AI growth, but I think screenwriting is an area where there's real potential. There are many screenplays out there, many movie plots are very similar to each other, and human raters could help with training. And it's worth noting that the top four highest grossing movies right now are all sequels or film adaptations. It's not a huge leap to imagine an LLM in the future that's been trained on movie writing being able to create a movie script when given the Wicked musical. https://www.imdb.com/chart/boxoffice/
The 2023 Writers Guild of America strike was in part to prevent screenplays being written entirely by generative AI.

So no I don’t think this will happen either. Authors may use use AI them selves as one tool in their tool box as they write their script, but we will not see entire production screen plays being written by generative AI set for theatrical release. The industry will simply not allow that to happen. At most you can have AI write a screen play for your own amusement, not for publication.

I'm thinking more of a Gibsonian 'Garage Kubrick'. A solitary auteur (or small team) that produces the film alone perhaps without even touching a camera, generating all the footage using AI (in the novel the auteur creates all the footage through photo/found-footage manipulation, or at least thats all we see in text). The script will probably be human written, I'm not talking about an AI producing a film from scratch, rather a film being produced using AI to create all the visuals and audio.
That is a far more reasonable prediction but I don’t even see this future. This kind of “film making” will at best be something generated for the amusement of the creator (think, give me a specific episode of Star Trek where Picard ...) or as prototypes or concepts of yet to be filmed with actual actors. And it certainly won’t be in theaters, not in 5 years, or ever.

Generative AI will not be able to approach the artistry of your average actor (not even a bad actor), it won’t be able match the lighting or the score to the mood (unless you carefully craft that in your prompt). It won‘t get creative with the camera angles (again unless you specifically prompt for a specific angle) or the cuts. And it probably won’t stay consistent with any of these, or otherwise break the consistency at the right moments, like an artist could.

If you manage to prompt the generative AI to create a full feature film with excellent acting, the correct lighting given the mood, a consistent tone with editing to match, etc. you have probably spent much more time and money into crafting the prompt than would otherwise have gone into simply hiring the crew to create your movie. The AI movie will certainly contain slop and be visibly so bad it guaranteed will not be in theaters.

Now if you hired that crew to make the movie instead, that crew might use AI as a tool to enhance their artistry, but you still need your specialized artists to use that tool correctly. That movie might make it to the theaters.

blair witch project looked like shit, 'the cinematography doesn't approach a true director of photography', the actors were shit... etc. Given the right script and concept it can be amazing and the imperfection of AI can become part of the aesthetic.
It's a tool. The cleverness and artistry comes from the humans, not from the tools they use.

The AI isn't creating the fresh ideas. People are.

So what you are saying is some aspects of movie making will use AI as parts of their jobs. That is very realistic and probably already happening.

Saying that large video models will be in theaters sounds like a completely different and much more ambitious prediction. I interpreted it as if large video models will produce whole movies on their own from a script of prompts. That there will be a single film maker with only a large video model and some prompts to make the movie. Such films will never be in the theater, unless by some grifter, and than it is certain to be a flop.

You should watch how movies are made sometime. How a script is developed. How changes to it are made. How storyboards are created. How actors are screened for roles. How locations are scouted, booked, and changed. How the gazillion of different departments end up affecting how a movie looks, is produced, made, and in which direction it goes (the wardrobe alone, and its availability and deadlines will have a huge impact on the movie).

What does "EXT. NIGHT" mean in a script? Is it cloudy? Rainy? Well lit? What are camera locations? Is the scene important for the context of the movie? What are characters wearing? What are they looking at?

What do actors actually do? How do they actually behave?

Here are a few examples of script vs. screen.

Here's a well described script of Whiplash. Tell me the one hundred million things happening on screen that are not in the script: https://www.youtube.com/watch?v=kunUvYIJtHM

Or here's Joker interrogation from The Dark Night Rises. Same million different things, including actors (or the director) ignoring instructions in the script: https://www.youtube.com/watch?v=rqQdEh0hUsc

Here's A Few Good Men: https://www.youtube.com/watch?v=6hv7U7XhDdI&list=PLxtbRuSKCC...

and so on

---

Edit. Here's Annie Atkins on visual design in movies, including Grand Budapest Hotel: https://www.youtube.com/watch?v=SzGvEYSzHf4. And here's a small article summarizing some of it: https://www.itsnicethat.com/articles/annie-atkins-grand-buda...

Good luck finding any of these details in any of the scripts. See minute 14:16 where she goes through the script

Edit 2: do watch The Kerning chapter at 22:35 to see what it actually takes to create something :)

I can't upvote this enough. This topic in the media space has generated a huge amount of naive speculation that amounts to "how hard could it be to do <thing i know nothing about>?"
> "how hard could it be to do <thing i know nothing about>?"

This is most Hacker News comments summarized lmao. It's kinda my favorite thing of this place: just open any thread and you immediately see so many people rushing to say ''well just do X or Y'' or ''actually it's X or Y and not Z like the experts claim''. Love it.

In this case, it’s movies and TV, which most people enjoy. So there’s a superficial accessibility to the problem which encourages this attitude.

Of course, HN being the place that it is, the same type of comments are made about quantum entanglement and solar panel efficiency.

I agree with you.

At the same time I am curious in the "that person has too many fingers" sense at what a system trained on tens of thousands of movies plus scripts plus subtitles plus metadata etc. would generate.

I thought about it for a bit and I would want to watch a computer generated Sharknado 7 or Hallmark Christmas movie.

Of course normally other people contribute to a movie after the writer. My comment mentioned three of the important roles. This whole thread is about tech that automates away those roles. That's the whole point.
I think you've misunderstood the objection.

Lets pick something concrete. It's a medieval script, it opens with two knights fighting. OK so later in the script we learn their characters, historic counterparts etc. So your LLM can match nefarious villain to some kind of embedding, and doubtless has trained on countless images of a knight.

But the result is not naively going to understand the level of reality the script is going for - how closely to stick to historic parallels, how much to go fantastical with the depiction. The way we light and shoot the fight and how it coheres with the themes of the scene, the way we're supposed to understand the characters in the context of the scene and the overall story, the references the scene may be making to the genre or even specific other films etc.

This is just barely scraping the surface of the beginnings of thinking about mise en scene, blocking, framing etc. You can't skip these parts - and they're just as much of a challenge as temporal coherence, or performance generation or any of the other hard 'technical issues' that these models have shown no capacity to solve. They're decisions that have to be made to make a film coherent at all - not yet good or tasteful or creative or whatever.

Put another way - you'd need AGI to comprehend a script at the level of depth required to do the job of any HOD on any film. Such a thing is doubtless possible, but it's not going to be shortcut naively the way generation an image is - because it requires understanding in context, precisely what LLMs lack.

> but the result is not naively going to understand the level of reality the script is going for…

We can already get detailed style guidance into picture generation. Declaring you want Picasso cubist, Warner brothers cartoon, or hyper realistic works today. So does lighting instructions, color palettes, on and on.

These future models will not be large language models, they will be multi-modal. Large movie models if you like. They will have tons of context about how scenes within movies cohere, just as LLMs do within documents today.

So, we went from "just hand off movie script to automated director/DP/editor" we're now rapidly approaching:

- you have to provide correct detailed instructions on lighting

- you have to provide correct detailed instructions on props

- you have to provide correct detailed instructions on clothing

- you have to provide correct detailed instructions on camera position and movement

- you have to provide correct detailed instructions on blocking

- you have to provide correct detailed instructions on editing

- you have to provide correct detailed instructions on music

- you have to provide correct detailed instructions on sound effects

- you have to provide correct detailed instructions on...

- ...

- repeat that for literally every single scene in the movie (up to 200 in extreme cases)

There's a reason I provided a few links for you to look at. I highly recommend the talk by Annie Atkins. Watch it, then open any movie script, and try to find any of the things she is talking about there (you can find actual movie scripts here: https://imsdb.com)

There's two reasons to be hopeful about it though: AI/LLMs are very good at filling in all those little details so humans can cherry pick the parts that they like. I think that's where the real value is in for the masses - once these models can generate coherent scenes, people can start using them to explore the creative space and figure out what they like. Sort of like SegmentAnything and masking in inpainting but for the rest of the scene assembly. The other reason is that the models can probably be architected to figure out environmental/character/light/etc embeddings and use those to build up other coherent scenes, like we use language embeddings for semantic similarity.

That's how I've been using the image generators - lots of experimentation and throwing out the stuff that doesn't work. Then once I've got enough good generated images collected out of the tons of garbage, I fine tune a model and create a workflow that more consistently gives me those styles.

Now the models and UX to do this at a cinematic quality are probably 5-10 years away for video (and the studios are probably the only ones with the data to do it), but I'm relatively bullish on AI in cinema. I don't think AI will be doing everything end to end, but it might be a shortcut for people who can write a script and figure out the UX to execute the rest of the creative process by trial and error.

That’s the same thing with digital art, even with the most effortless one (matte painting), there’s a plethora of decisions to make and techniques to use to have a coherent result. There’s a reason people go to school or trained themselves for years to get the needed expertise. If it was just data, someone would have written a guide that others would mindlessly follow.
Not sure why you jumped there. I was thinking more like ‘make it look like Bladerunner if Kurosawa directed it, with a score like Zimmer.’

You’re really failing to let go of the idea that you need to prescribe every little thing. Like Midjourney today, you’ll be able to give general guidance.

Now, I don’t expect we’ll get the best movies this way. But paint by numbers stuff like many movies already are? A Hallmark Channel weepy? I bet we will.

This is such an incredibly confident comment. I'm in awe.
Shane Carruth (Primer) released interesting scripts for "A Topiary" and "The Modern Ocean" which now have no hope of being filmed. I hope AI can bring them to life someday. If we get tools like ControlNet for video, maybe Carruth could even "direct" them himself.
This exists already actually. Kling AI 1.5. Saw the demo on twitter two days ago, which shows a photo-to-video transformation on an image of three women standing on a beach, and the video transformation simulates the camera rotating, with the women moving naturally. Just involves a segment-anything style selection of the women, and drawing a basic movement vector.

https://x.com/minchoi/status/1862975323433795726

Controlnet for video is just controlnet but ran frame by frame resulting in AI Rotoscoping.
brilliant take from Ben Affleck on ai in movies..

"movies will be one of the last things to be replaced by ai"

https://www.youtube.com/watch?v=ypURoMU3P3U

including this quote: "being a craftsman is knowing how to work, art is knowing when to stop"

It is absolutely true that LLMs do not know when to stop.
An adequate prompter (human at the prompt) knows when to stop.
That's what I describe at the end, albeit quickly in lingo, where the internal coherence is maintained in internal embeddings that are never related to English at all. A top-level AI could orchestrate component AIs through embedded vectors, but you'll never do it with a human trying to type out descriptions.