Hacker News new | ask | show | jobs
by astrange 1350 days ago
Research is still ongoing here, but it seems like diffusion models despite being named after the noise addition/removal process don't actually work because of it.

There's a paper (which I can't remember the name of) that shows the process still works with different information removal operators, including one with a circle wipe, and one where it blends the original picture with a cat photo.

Also, this article describes CLIP being trained on text-image pairs, but Google's Imagen uses an off the shelf text model so that part doesn't seem to be needed either.

1 comments

I think it might be this paper [1] succintly described by the author in this twitter thread [2]

[1] https://arxiv.org/abs/2208.09392 [2] https://twitter.com/tomgoldsteincs/status/156250381442263040...