|
|
|
|
|
by makomk
852 days ago
|
|
As I understand it, diffusion-based video generation models simply are not casual in this way. They work by modifying the previous frames in the video to be consistent with future frames just as much as they do later frames to be consistent with earlier ones. That's why Yann LeCun can argue that they do not have to be able to generate plausible continuations of a real video, just generate some arbitrary sample from the space of plausible-looking videos, and that the latter does not imply the ability to do the former. It's also why it's not possible to just generate videos of arbitrary length and lots of VRAM is required to create even a relatively short clip. |
|