I skimmed the paper and didn't see an answer to this: how much of the video did the AI actually generate? how much of it was touched up by humans and how much of it was actually drawn/animated solely by humans?
An engadget story demonstrates their simulation tool and discusses their "Simulation" venture a little bit more.
It does look like the video is animated by AI using pre-generated scenes and characters.
https://www.engadget.com/the-simulation-ai-put-me-in-a-south...
Based on my understanding after skimming through the paper (and assistance with Claude), the AI did not directly generate any full video content for a South Park episode. It seems like this was how the AI was used:
- Custom diffusion models were trained on South Park character and background image datasets. These models could then generate new South Park-style characters and backgrounds.
- GPT-4 was used to generate dialogue for scenes, based on prompts about the overall episode premise and plot points.
- An "AI camera system" was mentioned for scene setup, but details were not provided on how much of the camera work it handled.
Voice cloning was used to generate audio clips of the dialogue.
Note: this is just a skim of the paper, entirely possible I and Claude may have have missed something.
Judging by the rendered samples on the page, I'm gonna assume only the assets (backgrounds and characters) were AI generated. Animations, text, and camera movement were all done manually.