|
|
|
|
|
by DrScientist
490 days ago
|
|
The case looks pretty straightforward to me - they copied the notes ( human or machine doesn't really matter ) to directly compete with the author of the notes. If you wrote a program that automatically rephrased an original text - something like the Encyclopaedia Britannica - to preserve the meaning but not have identical phrasing - and then sold access to that information on in a way that undercut the original - then in my view that's clearly ripping off the original creators of the Encyclopedia and would likely stop people writing new versions of the encyclopedia in the future if such activity was allowed. These laws are there to make sure that valuable activities continue to happen and are not stopped because of theft. We need textbooks, we need journalistic articles - to get these requires people to be paid to work on them. I think it's entirely reasonable to say that an LLM is such a program - and if used on sources which are sustained by having paid people work on them, and then the reformatted content is sold on in a way to under cut the original activity then that's a theft that's clearly damaging society. I see LLM's as simply a different way to access the underlying content - the rules of the underlying content should still apply - ChatGPTs revenues are predicted to be in the billions this year - sending some of that to content creators, so that content continues to be produced, is not just right - it's in their interest. |
|
Note that it's very hard to do this starting from a single source, because in order to be safe from any copyright concern you'd have to only preserve the bare "idea" and everything else in your text must be independent. But LLM's seem to be able to get around this by looking at many sources that are all talking about the same facts and ideas in very different ways, and then successfully generalizing "out of sample" to a different expression of the same ideas.