| > A derivative work is a work that itself includes copyrighted content from the original work. If you put a GPL C program through Emscripten to run in a browser the output doesn't include the original C code but it's surely a derivative work. > Someone who looks at a dozen code examples in public repos to learn how to do e.g. a quick sort, then upon understanding the logic flow of the quick sort algorithm, writes his own quick sort implementation is not creating a derivative work of the code in the repos he exampled. And the way LLMs work is much more similar to that process than to the "compressed anthology" concept you're describing. This is undoubtedly the core of the disagreement. Humans can learn from what they have seen, appreciate it, understand it, and draw on that experience in what they create. They do this without being considered ripoff artists, so why not machines that simulate the "same" thing automatically? To me the answer is simply that humans are special. Human thought and human effort makes it creativity when a human does it, copying when a machine does it. It's a double standard I am perfectly willing to accept. I am unabashedly biased in this regard. That may seem remarkably unfair to the machines, or like a cop-out. I just carved out a hardcoded special case for humans, and my whole philosophical reasoning is "because I said so". But how fair do we want to be? After all, if you want to treat a machine exactly like a human who learns from prior art to create new art, then the ownership of the new art would also belong to the machine. Not to the person who prompts it. |
Because it does include content from the original work -- this is just a translation, and isn't comparable to how LLMs work.
> To me the answer is simply that humans are special.
I don't disagree, but I also view LLMs as tools that extend human capacities and not autonomous entities unto themselves. LLMs are still just software, and can't really be regarded as anything other than instruments that humans use to broaden their capacity to see, appreciate, understand, and draw on that experience in what they create.
> That may seem remarkably unfair to the machines, or like a cop-out.
No, it's unfair to the humans. The machines are just tools that they use. The "double standard" is really a set of inconsistent standards applied to the same underlying moral agents.
> After all, if you want to treat a machine exactly like a human who learns from prior art to create new art, then the ownership of the new art would also belong to the machine. Not to the person who prompts it.
No, it always belongs to the person who prompts it. The machine is not a conscious entity, bears no intentions, and has no capacity to act on its own initiative. The machine is always just a tool that extends human capacity, as all machines always have.
For a good comparison here, we've never not credited a photographer as the author of a photograph. But the photographer is in a sense merely prompting the camera by framing the shot, selecting the exposure, adjusting the lighting, etc. -- the hard work in actually creating the photograph is being done by the camera itself, with the photographer playing no role in directly constructing the final image, and with the many of the qualities of the final image being determined by pre-existing features of the camera's functional design and components that the photographer also played no role in defining, apart from choosing which camera to use.
LLMs are like cameras in this way. And the fact that they rely on external data for model training no more disclaims the user as the author of the resulting work than looking things up in a dictionary or encyclopedia does the same for the author of an essay.