Hacker News new | ask | show | jobs
by snek_case 1489 days ago
It's hard to say because even a model like GPT-3 is limited in its ability to generate a textual story that remains coherent over time. When you're talking about generating video, you need to have lots of story and visual details remaining coherent over a very long time horizon. Generally speaking, I think this is an area where deep neural networks are fairly weak and symbolic AI shines. It's much easier to program a symbolic AI that generates a story that remains coherent over time. Though you might argue that the story probably wouldn't be very interesting. There's probably something to be done with a hybrid model that uses symbolic AI to enforce coherency constraints, and a deep network model that fills in details and generates visuals.

So yeah, I think we'd need new, much more sophisticated architectures. We'd also need a lot more compute, like 10x, 100x or maybe even 1000x more, to generate high-resolution video. Actually, the problem is probably not the amount of compute you need for inference, but the amount of compute you'd need to train a model with hundreds of billions of parameters or however much is needed to make that happen.

3 comments

I suspect the solution to keeping a long story coherent is using the model at different levels of abstraction. A human writer doesn’t sit down and write a complete novel in one sitting. They go through a process of planning, character development, world building and so on. When they write a scene, they’re not holding every detail about the rest of the book in their mind, they’re narrowing down to the details that matter for that scene.

So instead of asking the AI to write a novel in one go, why not guide it through a similar process? At each step, pass in information from previous steps as context, focusing on just the details it needs at that step. Have it generate a summary, then a setting, then characters in that setting, then break the plot into chapters, and then scenes, and so on…

Yup, this makes a lot of sense to me. I could even see models broken down by director/film. So many combinations could be used - the possibilities are endless. A Tarantino model like that of Pulp Fiction might be a good one.
Having a story remain coherent over time is not a prerequisite of Hollywood blockbusters.
Reminds me of that South Park episode in which Cartman disguises himself as a robot to prank Butters, but movie executives confuse him for an actual robot and make him think up movie ideas.

"Adam Sandler is like, in love with some girl, but then it turns out that the girl is actually a Golden Retriever. Or something."

Incoherency is not all bad, and choosing the subject carefully can be enough for the intrinsic weirdness of ai generation shine.

I.e. this batman short: https://m.youtube.com/watch?v=fn4ArRmzHhQ (ai story and human drawing) provide a spot on jocker

Those "AI writes" are more a meme than actual AI. GPT-3 writes perfectly grammatical sentences but the stories don't make much long-term sense, which is the opposite of what the story in the video has.

These videos are mostly "human writes funny, says it's AI".

Right. The Transformers comes to mind.
> It's hard to say because even a model like GPT-3 is limited in its ability to generate a textual story that remains coherent over time.

Just like most dreams. Can still be entertaining.

In a limited capacity. Any dream I'm even a little aware of becomes extremely boring, frustrating, and claustrophobic. Even the good ones become nightmares without anything changing. That might just be me though, maybe GPT could cook up dreams which don't suck
Oh, but dreams — even the most wild ones — are coherent, specially over time. They're just not usually obvious to the ego mind.