|
|
|
|
|
by profmonocle
611 days ago
|
|
> if LLM training involves merely reading a dataset, but it is not strictly necessary to copy, or even store it verbatim to be useful, then does it even fall under copyright protection at all? Copyright includes the creation of derivative works, not just literally copying the source material. For instance, imagine I read a novel, then I decide to write my own, unauthorized sequel to it. It's not a literal "copy" of the original material - it's my own original text, but obviously a derivative work of the original material. Under copyright law, that would be infringement - I would be sued if I tried to sell that. (Yes, that means fanfiction is infringing, but most rights holders have wisely decided to look the other way on that, as long as it's non-commercial.) This is what people who claim AI is infringing are worried about. Not that the AI has a literal copy of the source material in its training data, but that the training data can be used to produce a derivative work. I could write a (crappy) fanfic of the Lord of the Rings without directly referencing the books/movies. And that doesn't mean I have a complete copy of the books/movies in my head - that isn't how memory works. Until now, creating a derivative work without directly using the source material was something only humans could do. This is completely uncharted legal territory. |
|