Hacker News new | ask | show | jobs
by concordDance 1276 days ago
You keep thinking ML models are copying your work but I think this comes from ignorance about how they work.

They're not copying, they're doing the equivalent of looking at thousands of paintings and from that figuring out what paintings are supposed to look like. If you were asked to draw something in the style of Picasso and did so, would your new art work be a copyright violation?

Is every painting you've ever seen public domain? Have you ever got an idea or influence from a non public domain artwork?

1 comments

Here’s a thought experiment for you:

Is lossly encoding generative or copyright violation? Let’s say you encode a bunch of copyright images as lossy jpegs, specifically? Most people would say “copyright infringement“.

Now, if I have an encrypted zip file, random binary blob… but with a pass phrase, it decodes to a bunch of copyright images. Still copyright infringement right?

Now, if I have an AI compression algorithm where the model that rebuilds the content of the archive is a local binary blob, and all you transmit is the pass phrase, it’s still copyright infringement right?

Even though, you’re actually not sharing the content of the archive, you’re just sharing the pass phrase and the AI is rebuilding it from that pass phrase.

So, 1) when you rebuild a binary distinct output that sufficiently closely resembles the input, 2) regardless of the manner in which you are doing it, technically, it’s copyright infringement.

Now, this is where your argument fails.

…because your argument is that the technical means by which the content is “generated” inherently makes the content distinct and not copyright infringement.

However, “generation” and “decompression” are the same thing; the technical means by which the output is generated is irrelevant.

The difference is that in this case, the output is sufficiently distinct that you could reasonably argue that it is different from the training data. That’s a fair argument!

…buuuut, here’s the thing: if your model can generate outputs that are copyright infringement because they are similar but not binary identical to existing specific copyrighted works… then what you have “effectively” is a giant compressed archive that has both copyright and non-copyright work in it.

That’s problematic.

I am not misunderstanding how the models work; I’ve built models like this, and yes, there’s no “copy of the image” in the checkpoint file… just like an encrypted zip file doesn’t have a “copy of the image” in it until you apply the correct decompression and decryption algorithm, it’s just a blob with high entropy.

> Have you ever got an idea or influence from a non public domain artwork?

That is fundamentally not the issue here.

The issue is that the model checkpoint file contains a combination of data and algorithm that can rebuild both novel and copyrighted work.

If a zip file with both copyright and non copyright work in it is infringement, so is this.

Ie. tldr: yes, it’s complicated, but this “this comes from ignorance about how they work” BS that is floating around the SD community is both dishonest, and lacks itself an understanding of how the models work. It’s a pretty ironic thing to say really. The right prompt can generate copyrighted content. That how the models were trained. Not binary identical, but as we’ve already established, that’s not necessary condition for copyright infringement, or like every lossy image format would be free game.

…and yes, I get it, if someone invented a purely algorithmic decompression algorithm that could take “source code for MS Word” and generate exactly that as the output, then I’m arguing it would be infringing. Yup. …because the only way you could create that algorithm would be to encode the original source code into it.

But you can't rebuild the original image unless you're so incredibly specific that those same instructions given to a random artist would make the original image. And even then...

So your argument rests on a premise that, as far as I can tell, isn't true. What prompts did you give to what ML model that gave you something copyrighted?

Because I just tried a whole bunch of different prompts to make the bloody Mona Lisa with no luck.

> But you can't rebuild the original image

You can.

Diffusion models do this:

- prompt -> vae -> latent -> diffusion -> latent -> vae -> image

Or, for image-to-image:

- image -> vae -> latent

- prompt -> vae -> latent -> mix image latent + diffusion -> latent -> vae -> image

Try image-to-image on the source image with a low strength. Can it regenerate the image? If the answer is yes, then there is categorically some latent that maps to the output. i.e. It is technically possible to generate the image from the model.

The question is, how would you generate that latent from a prompt?

You do it like this:

1) Pick an image you want to find, eg. https://www.artstation.com/artwork/Zl6Zx

2) Do a reverse image search to find a suitable prompt, from: https://huggingface.co/spaces/pharma/CLIP-Interrogator

In this case, it's:

> a painting of a woman with blue eyes, by Aleksi Briclot, trending on Artstation, ghostly necromancer, red haired young woman, downward gaze, the blacksmits’ daughter, her gaze is downcast, dressed in a medieval lacy, gothic influence, screenshot from the game, from netflix's arcane, high priestess

3) Search the laion database for that prompt and see if it was part of the training data: https://rom1504.github.io/clip-retrieval

(Yes, it is: in this case, the top hit is a match score of 0.3968).

4) Crack open stable diffusion.

Put the cfg scale up (do not allow variation) and pick a some step value that generates reasonably accurate images. Maybe like k_euler, 50 steps, cfg scale 30, 512x768.

You're now generating images that are 'nearby' in the latent to the target; now its just pissing around in the seed and with variations to narrow the gap.

> So your argument rests on a premise that, as far as I can tell, isn't true.

...but the point I'm making is that it is a) possible, and b) plausible, if you can be bothered doing a seed search. Can you be bothered? I can't be bothered.

Like... I mean, dude, the model was trained to be able to do this. That's literally what it's supposed to do. The VAE can map a latent to real existing images trivially (that's what it literally does when you use image-to-image). The latent space is a 64x64x4 vector that you're moving through with your prompt in 'latent space'.

Look, I get it, the chances are you picking the exact seed that generates this exact image are pretty slim right? 64x64x4 is a massive fucking number, and the chances of stumbling on exactly the right seed is like winning the lottery right?

...but that would only be true if you were RANDOMLY moving through the latent space, and you're not. You're specifically homing in on 'good' latent space values around real images using your prompt. That means that the chance of the latent you pick being a real picture is not 1/64x64x4... it's actually plausibly higher.

There is a non-zero chance that any generated output is actually a real image.

So, back so our original quesiton:

Is sharing a magic seed (123123123 or whatever) the same as sharing a password to an encrypted zip file?

...because that's what it comes down to.

You have:

- (algorithm) that takes (pass phrase) and generates (output).

- If you can provide (pass phrase) and generate (output) that is copyrighted content, then is (algorithm) infringing?

- The answer applies the same way to encrypted zip files and to AI models.

So... you gotta pick which way you wanna roll with it, but you can't have one or the other. You get both, with the same rules.

That's the problem.