| Here’s a thought experiment for you: Is lossly encoding generative or copyright violation? Let’s say you encode a bunch of copyright images as lossy jpegs, specifically? Most people would say “copyright infringement“. Now, if I have an encrypted zip file, random binary blob… but with a pass phrase, it decodes to a bunch of copyright images. Still copyright infringement right? Now, if I have an AI compression algorithm where the model that rebuilds the content of the archive is a local binary blob, and all you transmit is the pass phrase, it’s still copyright infringement right? Even though, you’re actually not sharing the content of the archive, you’re just sharing the pass phrase and the AI is rebuilding it from that pass phrase. So, 1) when you rebuild a binary distinct output that sufficiently closely resembles the input, 2) regardless of the manner in which you are doing it, technically, it’s copyright infringement. Now, this is where your argument fails. …because your argument is that the technical means by which the content is “generated” inherently makes the content distinct and not copyright infringement. However, “generation” and “decompression” are the same thing; the technical means by which the output is generated is irrelevant. The difference is that in this case, the output is sufficiently distinct that you could reasonably argue that it is different from the training data. That’s a fair argument! …buuuut, here’s the thing: if your model can generate outputs that are copyright infringement because they are similar but not binary identical to existing specific copyrighted works… then what you have “effectively” is a giant compressed archive that has both copyright and non-copyright work in it. That’s problematic. I am not misunderstanding how the models work; I’ve built models like this, and yes, there’s no “copy of the image” in the checkpoint file… just like an encrypted zip file doesn’t have a “copy of the image” in it until you apply the correct decompression and decryption algorithm, it’s just a blob with high entropy. > Have you ever got an idea or influence from a non public domain artwork? That is fundamentally not the issue here. The issue is that the model checkpoint file contains a combination of data and algorithm that can rebuild both novel and copyrighted work. If a zip file with both copyright and non copyright work in it is infringement, so is this. Ie. tldr: yes, it’s complicated, but this “this comes from ignorance about how they work” BS that is floating around the SD community is both dishonest, and lacks itself an understanding of how the models work. It’s a pretty ironic thing to say really. The right prompt can generate copyrighted content. That how the models were trained. Not binary identical, but as we’ve already established, that’s not necessary condition for copyright infringement, or like every lossy image format would be free game. …and yes, I get it, if someone invented a purely algorithmic decompression algorithm that could take “source code for MS Word” and generate exactly that as the output, then I’m arguing it would be infringing. Yup. …because the only way you could create that algorithm would be to encode the original source code into it. |
So your argument rests on a premise that, as far as I can tell, isn't true. What prompts did you give to what ML model that gave you something copyrighted?
Because I just tried a whole bunch of different prompts to make the bloody Mona Lisa with no luck.