Hacker News new | ask | show | jobs
by Lerc 1213 days ago
The flickering comes from the fundamental nature of the de-noising mechanism involved in the diffusion model. The ability to create multiple novel images for the same input comes from adding noise with a random seed. Currently this is more or less done every frame which is why you get the flickering. Keeping the same seed wouldn't be helpful if you want the image to move.

What could be of use here is a noise transformation layer that can use the same noise for every frame but transformed to match desired motion. For video conversion you could possibly extract motion vectors from successive frames to warp the noise.

I assume someone is working on this somewhere.

1 comments

"The flickering comes from the fundamental nature of the de-noising mechanism involved in the diffusion model." -- agreed

"Keeping the same seed wouldn't be helpful if you want the image to move." -- No, I'm using the same seed (and prompt). The image moves because ControlNet opens up another channel of input, in this case the pose data.

Yes but that still produces temporal aliasing because the unmoving noise is battling the moving controlnet input. I can't find it right now but there was a good example showing a gallery of one word prompts with the same seed. While the images were of different subjects you could clearly see the impact of the noise controlling layout. What was was a capital letter A in one image was a persons legs in another, but that same overall structure was visible in the same place in 90% of the images