|
Well that was unexpectedly awesome. It's useful to read even for people doing enterprise AI work, just to understand the workflow that went into improving the output quality. I've been dreaming about this possibility since about the release of GPT-2, amazing to see someone made it. The current status-quo is very dissatisfying: sci-fi is only really made by a handful of huge US networks that insist on filling stories with useless but pervasive, offensive and ham-fisted attempts at social engineering. Beyond being bad in its own right it often breaks the script, e.g. you can guess who is going to end up being be a bad and good character just from their race and gender, making it hard for script writers to genuinely surprise you. That said, I don't think having AI write the scripts from scratch is the right way to go here. The dialogue for the first episode still smells of RLHF, with characters being far too complimentary to each other and having bizarre verbal ticks. And is it needed? The world is full of people with smart stories who want to tell them, but we're in an era when reading is in decline. So the most interesting part of this is all the tooling that comes after that point: the rug smoothing, the AI-generated voice acting and especially the game engine based renderer that can generate videos given simple instructions. The blog posts sort of glide over that part, I guess due to the author's background in game engine development, but it seems the most useful part actually. The key here is going to be connecting people with different skills in an open-source or more YouTube like system that allows people to remix each other's show kits (bibles, 3D objects, scene lots etc), so someone who develops a great world can accept fan episodes written with that show kit and then share in the monetization of them. Something like that would make story telling way more decentralized and allow it to get somehow "back to reality". |
> That said, I don't think having AI write the scripts from scratch is the right way to go here. The dialogue for the first episode still smells of RLHF, with characters being far too complimentary to each other and having bizarre verbal ticks. And is it needed? The world is full of people with smart stories who want to tell them, but we're in an era when reading is in decline.
I'm not sure that it's right to say that the scripts are written "from scratch" -- the "Bible" for the series is hand-written. From Part 2 of the blog post:
> Episode generation is autonomous, but the show bible is human-made. The prompts and code that control the LLM are human-made, too. Each episode’s output is closely reviewed by humans. Because models often change, and each new episode tends to reveal bugs/weaknesses in the system, prompts get tweaked by humans, too. This is less and less necessary as more episodes are produced.
If the hierarchy goes Bible (series) -> Synopsis (Episode summary) -> Script (scene details), then the author is hand-writing #1, and you're suggesting humans hand-writing #3.
> So the most interesting part of this is all the tooling that comes after that point: the rug smoothing, the AI-generated voice acting and especially the game engine based renderer that can generate videos given simple instructions. The blog posts sort of glide over that part, I guess due to the author's background in game engine development, but it seems the most useful part actually.
The visualizer / generator certainly is the most novel and useful part of this. I had the same struggles / hangups with the overly-complimentary dialogue in E1 as you did, and it smells much of GPT-4. That said, I agree with the author -- this feels like the first "self-hosting" version of this entire pipeline. Steve Newcomb wrote an article on the idea of taking the lessons learned from CI/CD pipelines and applying them to movie development:
https://stevenewcomb.substack.com/p/a-whole-new-way-to-creat...
Now that the OnScreen system is "self-hosting" (maybe not the right analogous word) and producing the entire movie when clicking "build", it's possible to hand-tune things as needed to realize a vision -- with whatever level of detail and abstraction that the author would want -- whether it's at the "Bible" level, or on a more detailed note.