Hacker News new | ask | show | jobs
by ShamelessC 1382 days ago
The article shows a model that does this.

It's only a few frames, but they are entirely generated from text - no seed image or interpolation required.

2 comments

What is referred to/defined as "interpolation" because as an outsider... isn't "Stable Diffusion interpolating text into images/frames/video" in a "literal" (maybe not technical) sense?
It's to be interpreted in the quasi-mathematical sense where you have images for frame A and frame B representing your data points. To interpolate between those frames, a flow of plausible images simulating the transition from A to B is generated.
Interpolation here meaning one smooth motion transition is all that is depicted. An entire episode of television requires things like cuts between scenes, possibly discontinuities like flashbacks, scenes that take place days, months, or even decades later, and characters should still look the same, but might be wearing different clothing, or grow a beard, or get really old but still have similar facial features and the same skin color. If one ages, they should all age about the same, unless it's a story with time travel or humanoid immortal characters that don't age.

I'm sure these types of capabilities will come at some point, but no current model can do it. It requires more than just projecting motion into a scene.