| > It's pretty indisputably piracy, whether or not it's legal/fair use/whatever. Ah, this is obviously some strange usage of the word 'indisputably' that I wasn't previously aware of. > I believe many artists rightly refuse to accept this threat to their livelihoods because it was built on their labor. This model is trained from scratch using only public domain/CC0 and copyright images with specific permission for use: https://huggingface.co/Mitsua/mitsua-diffusion-one Does it change anything? If all the other models were deleted, and this was the only one left, and all future models also had to be similarly licensed, would it change even one single point? Even if it was the only remaining model and this kind of licensing a requirement for all future work, artists would still be automated out of their highly skilled yet poorly paid profession. It still sucks. There's still no nice way to convey that. > You built a commercial product on unlicensed data. Do you actually think the law is going to agree that that's fair use? What do you think the Google search engine is, if not a commercial product built on unlicensed data? The courts go both ways on this specific question with Google depending on the exact details, because nothing in law is as easy or simple as the clear-cut, goodies-vs.-baddies, black-and-white morality play you want this to be. The fact that Stability AI have not yet been sued out of existence in a simple open-and-shut court case about copyright infringement ought to have demonstrated both this point, and also that the question "is this piracy?" is, in fact, disputable. |
It seems incredible to me to suggest that piracy wasn't involved in the collection of training data, regardless of your view on the morality or legality of it. Datasets like books 3 indisputably contained copyrighted content that was being distributed without permission from the rightsholder. That's just the definition of piracy. If we can't agree on that then I'm not sure what we're doing here.
More materially to this discussion, yes, it would absolutely make a difference if the AI was only trained on licensed content. I wouldn't use it but I wouldn't have a problem with it. The issue is specifically that much of the work being used without permission is being used to replace the people who made that work, and is being used without permission. If the model is based on ethically acquired data, it would be less able to reproduce the style of specific artists. Imo, there would be more room for both kinds of art in this case.
I'm also aware that it's not a clear cut case legally but I think AI advocates and tech enthusiasts think it's a lot more likely that AI will win in court than the actual chances. Napster took years to litigate and was eventually shutdown. There's a really good discussion about this on the decoder podcast between actual lawyers.