| HN Mirror

For Stable Diffusion I think the average number of bits in the model compared to the number of training images is in the order of 6-8 bits per image. There is no "storage" of the training images. It's 250 TB data in, and 1.4 GB in the weight file or so depending on the precision.. I think those 250 TB are compressed as well, so maybe 25,000 TB raw data in distilled down to 1.4 GB. I fairly certain you could never prove an AI saw your image. You'd have to sue the company and by discovery look at their training data.

There are probably pathological cases where a repeating image is more strongly overfit in the training data and could be reproduced in much more detail than this average though. But the systems learn similar to the human brain, they learn the gist of a style or scene and how it relates to words. It's not a search engine, it doesn't copy/paste any block of pixels...

One interesting example is that since SD's original training set included some stock photo watermarked images, it learned that there was a concept of watermark, which can end up in the middle of generated images. Not in an intelligible way, but you can see roughly how it interpreted this detail. And in those cases you DO have a very very repeating similar pixel bitmap in the training data.