| HN Mirror

> But you can't rebuild the original image

You can.

Diffusion models do this:

- prompt -> vae -> latent -> diffusion -> latent -> vae -> image

Or, for image-to-image:

- image -> vae -> latent

- prompt -> vae -> latent -> mix image latent + diffusion -> latent -> vae -> image

Try image-to-image on the source image with a low strength. Can it regenerate the image? If the answer is yes, then there is categorically some latent that maps to the output. i.e. It is technically possible to generate the image from the model.

The question is, how would you generate that latent from a prompt?

You do it like this:

1) Pick an image you want to find, eg. https://www.artstation.com/artwork/Zl6Zx

2) Do a reverse image search to find a suitable prompt, from: https://huggingface.co/spaces/pharma/CLIP-Interrogator

In this case, it's:

> a painting of a woman with blue eyes, by Aleksi Briclot, trending on Artstation, ghostly necromancer, red haired young woman, downward gaze, the blacksmits’ daughter, her gaze is downcast, dressed in a medieval lacy, gothic influence, screenshot from the game, from netflix's arcane, high priestess

3) Search the laion database for that prompt and see if it was part of the training data: https://rom1504.github.io/clip-retrieval

(Yes, it is: in this case, the top hit is a match score of 0.3968).

4) Crack open stable diffusion.

Put the cfg scale up (do not allow variation) and pick a some step value that generates reasonably accurate images. Maybe like k_euler, 50 steps, cfg scale 30, 512x768.

You're now generating images that are 'nearby' in the latent to the target; now its just pissing around in the seed and with variations to narrow the gap.

> So your argument rests on a premise that, as far as I can tell, isn't true.

...but the point I'm making is that it is a) possible, and b) plausible, if you can be bothered doing a seed search. Can you be bothered? I can't be bothered.

Like... I mean, dude, the model was trained to be able to do this. That's literally what it's supposed to do. The VAE can map a latent to real existing images trivially (that's what it literally does when you use image-to-image). The latent space is a 64x64x4 vector that you're moving through with your prompt in 'latent space'.

Look, I get it, the chances are you picking the exact seed that generates this exact image are pretty slim right? 64x64x4 is a massive fucking number, and the chances of stumbling on exactly the right seed is like winning the lottery right?

...but that would only be true if you were RANDOMLY moving through the latent space, and you're not. You're specifically homing in on 'good' latent space values around real images using your prompt. That means that the chance of the latent you pick being a real picture is not 1/64x64x4... it's actually plausibly higher.

There is a non-zero chance that any generated output is actually a real image.

So, back so our original quesiton:

Is sharing a magic seed (123123123 or whatever) the same as sharing a password to an encrypted zip file?

...because that's what it comes down to.

You have:

- (algorithm) that takes (pass phrase) and generates (output).

- If you can provide (pass phrase) and generate (output) that is copyrighted content, then is (algorithm) infringing?

- The answer applies the same way to encrypted zip files and to AI models.

So... you gotta pick which way you wanna roll with it, but you can't have one or the other. You get both, with the same rules.

That's the problem.