|
|
|
|
|
by 8note
1174 days ago
|
|
I think the counter on those arguments is that LLM owners want to avoid arguing that the model is a derivative work of the training data. If the LLM is a specific arrangement of the copyrighted works, it's very clearly a derivative work of them |
|
However, to address your point about derivative works directly, the consensus among copyright law experts appears to be that whether a particular model output is infringing depends on the standard copyright infringement analysis (and that’s regardless of the minor and correctable issue represented by memorization/overfitting of duplicate data in training sets). Only in the most unserious legal complaint (the class action filed against Midjourney, Stability AI, etc.) is the argument being made and that the models actually contain copies of the training data.