|
|
|
|
|
by astrange
1350 days ago
|
|
Research is still ongoing here, but it seems like diffusion models despite being named after the noise addition/removal process don't actually work because of it. There's a paper (which I can't remember the name of) that shows the process still works with different information removal operators, including one with a circle wipe, and one where it blends the original picture with a cat photo. Also, this article describes CLIP being trained on text-image pairs, but Google's Imagen uses an off the shelf text model so that part doesn't seem to be needed either. |
|
[1] https://arxiv.org/abs/2208.09392 [2] https://twitter.com/tomgoldsteincs/status/156250381442263040...