|
|
|
|
|
by Intralexical
405 days ago
|
|
> LLMs are certainly not a jpeg or a database... Their weights are derived from copyrighted works. Evaluating them preserves the semantic meaning and character of the source material. And the output directly competes against the copyrighted source materials. The fact they're smudgy and non-deterministic doesn't change how they relate to the rights of authors and artists. |
|
(Leaving aside whether the weights of an LLM does actually encode the content of any random snippet of training text. Some stuff does get memorized, but how much and how exactly? That's not the point of the LLM, unlike the jpeg or database.)
And, again, look at the search snippets case - these were words produced by other people, directly transcribed, so open-and-shut from a certain point of view. But the decision went the other way.