|
|
|
|
|
by cmiles74
534 days ago
|
|
Information on how large language models are trained is not hard to come by, there are numerous articles that cover this material. Even a brief skimming of this material will make it clear that the training of large language models is materially different in almost every way from how human beings "learn" and build knowledge. There are still many open questions around the process of how humans collect, store, retrieve and synthesize information. There is little mystery to how large language models function and it's clear that their output is parroting back portions of their training data, the quality of output degrades greatly when novel input is provided. Is your argument that people fundamentally function in the same way? That would be a bold and novel assertion! |
|
If this were true, then you would be able to identify the specific work being "parroted" and you'd have a case for copyright infringement regardless of whether it was produced by an LLM at all. This isn't how LLMs work though. For instance, if an LLM's training data includes the complete works of a given author and then you prompt the LLM to write a story in the style of that author, it will actually write an original story instead of reproducing one of the stories in its training corpus. It won't be particularly good but it will be an original work.
It also isn't obvious whether or not, or to what degree, LLM training works differently from human learning. You yourself acknowledged that there are "many open questions" about how human learning works, so how can you be so confident that it's fundamentally different? It doesn't matter anyway because you can still apply the exact same standards to LLM output to judge whether it infringes copyright that you would to something that was produced by a human being.