|
|
|
|
|
by ben_w
522 days ago
|
|
I don't see how it really helps? We could pass laws requiring models to demonstrate their training sets irregardless of how the training is distributed; and conversely if this is a community-led project, those also have copyright issues to deal with (wikipedia for example). I suspect there's also a problem in that, e.g. ten million student essays about different pages of Harry Potter can each in isolation be justified by the right to quote small fragments for critical purposes, but the collection together isn't because it quotes an entire book series. |
|
Copyright is intended to reward investment in creative works by giving sole license to distribute. It is not intended to create a monopoly on knowledge about the work.
If I can ask an LLM (or person!) “what’s the first sentence in Harry Potter?” And then “what’s the second sentence?” and so on, that does not mean they are distributing the work in competition with the rights holders.
We have gone way overboard with IP protections. The purpose of copyright is served when Rowling buys her 10th mansion. We do not need to further expand copyright to make it illegal to learn from a work or to remember it after reading.