|
|
|
|
|
by const_cast
414 days ago
|
|
> That's because you would be redistributing the actual material, just in a really roundabout way. Right, which I’m arguing is what LLMs do just in an even more roundabout way. The technical details of LLMs don’t actually matter. We don’t really care if they’re a database or not. The question is do they reproduce the source material? And yeah, pretty much they do, in a lot of instances. Not all, but a lot. To produce yet another analogy, imagine I have a service X. You can pay and I will give you any movie you want. You don’t know how I do it. Is this copyright infringement or not? I would say yes. Now let’s say I reveal the secret - I open up photoshop and painstakingly recreate the movie frame by frame. I might make a mistake here or there. Is this still copyright infringement? I think it is. |
|
Okay, but that is not what's happening here. Demonstratably so. The fact that a model is technically capable of overfitting to certain very repeated points in the training data doesn't mean the entire thing has to be shot down. The non-infringing uses far outweigh the offending ones, by a lot.
If what you say is true, and they do outright copy a lot, then it should be pretty easy for any IP holder to sue anyone who misuses the model that way for copyright infringement on those specific outputs.