Hacker News new | ask | show | jobs
by saurabh20n 1236 days ago
The last author's tweet thread and replies have some interesting tidbits: https://twitter.com/Eric_Wallace_/status/1620449934863642624

* "We propose to extract memorized images by generating many times with the same prompt and flagging cases where many of the generations are the same."

* "- Diffusion models memorize more than GANs - Outlier images are memorized more - Existing privacy-preserving methods largely fail"

* "Stable Diffusion is small relative to its training set (2GB of weights and many TB of data). So, while memorization is rare by design, future (larger) diffusion models will memorize more."

* "It only memorizes a very small subset of the images that it trains on."

* "our goal is to show that models can output training images when generating in the same fashion that normal users do."

1 comments

> * "It only memorizes a very small subset of the images that it trains on."

An interesting question here would be: why does it memorise these images over others? Can the other images still be synthesised with loss via a suitable prompt? If so, are the memorised images important for this? Can this set be reduced further?