There was a recent case that everyone has been describing as "LLM output can't be copyrighted" but what it actually said was you can't register the AI as the author.
This is not true, and I'd love to see some actual citation here.
The courts have repeatedly said that copyright only applies to human creativity. The Supreme Court explicitly said this when they refused to hear the appeal:
> "We affirm our decision to refuse registration for the Work because it lacks the human authorship necessary to be eligible for copyright protection."
So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.
The related case about patents is more supportive of the narrative that AIs cannot be authors (see https://www.cafc.uscourts.gov/opinions-orders/21-2347.OPINIO...), specifically: "Here, there is no ambiguity: the Patent Act
requires that inventors must be natural persons; that is,
human beings."
The patent situation is that the Act says that inventor must be an individual, which the courts are interpreting to mean a human, so the LLM cannot be named as the inventor. So, in this case, yes, this is just saying that an LLM cannot be named as the inventor of a patent. That's not the same thing as the courts are saying with copyrights.
> So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.
They're saying that the LLM can't be the author.
Now suppose you supply the LLM with a prompt that contains human creativity, it performs a deterministic mathematical transformation on the prompt to produce a derivative text, and you want to copyright that, claiming yourself as the author. What happens then?
If you think the answer is that you can't, how do you distinguish that from what happens when someone writes source code and has a compiler turn it into a binary computer program? Or do you think that e.g. Windows binaries can't be copyrighted because they were compiled by a machine?
My understanding was that they did in fact do just that, but the court somehow misunderstood what they were doing, and assumed that the LLM was working completely autonomously without any human input at all, which isn't really possible IMO. Someone told it what to do.
They also argued that you couldn't copyright an output that you can't explain how it came to be, i.e. if they had been able to articulate how an LLM works, the outcome might have been quite different, which I found surprising.
If art in general (human-made or otherwise) is always derived from existing influences... should we really be forced to explain how or why we created a piece of art in order to defend it?
The usual bar for copyright infringement of a derivative work is, from what I have seen, "how much did you copy from the original, and how obvious is it", which is of course a subjective determination that would be made by each individual judge or jury of a case.
The part that the human created, the prompt, can be copyrighted.
The part that the LLM created, cannot be.
Copyright in code works exactly the same way: the source code is copyrighted. The binary code is only copyrighted to the extent that it is derived from the source code. This is well-established.
Maybe I am just misunderstanding something, but I feel like you might be contradicting yourself here... why can LLM output not be copyrighted, but compiler output can be?
No, that's the point - the compiler output is only copyrighted to the extent that it is derived from the source code. The compiler itself cannot create anything copyrightable, but because there is a deterministic link between the source code and the binary code, and the source code was the product of a human, the binary code is covered by the source code copyright.
It's like a photocopier. If you photocopy a page from a book, that page is still covered by the copyright of the book author, even if the page is 2x larger or otherwise transformed by the machine.
IMO the bigger question is how would you even tell if a work was generated by an LLM? There's a ton of code being written out there; the folks who generated it are going to claim they authored it for copyright purposes, and those who want to use it are going to claim it was LLM-generated. So what happens?
The alleged author, when bringing a copyright infringement suit, will submit testimony claiming they wrote it. Parties to the suit will have a chance to present arguments and evidence. Then, the claim will be adjudicated by a judge and/or jury.
That code isn't going to be open source. And if you use someone else's closed source code you are violating laws that have nothing to do with copyright.
I'm not sure I understand. I'm not talking about stolen/leaked code here. I'm saying: imagine you claim you're the author of some piece of code. You may or may not have written it with an LLM, but even if so, assume you have the full rights to all the inputs. You post it publicly on GitHub. You don't attach a license, or perhaps you attach a restrictive license that doesn't permit much beyond viewing. Someone comes across your code, finds it brilliant, and wants to use it. If that code was non-copyrightable (such as generated via an LLM), then they're fine doing it without your permission, no? But if that code was copyrightable, then they're not permitted to do so, correct?
So now consider two questions:
1. You actually didn't use an LLM, but they believe & claim you did. Who has the burden of proof to show that you actually own the copyright, and how do they do so?
2. They write new code that you feel is based on yours. They claim they washed it through an LLM, but you don't believe so. Who has the burden of proof here and how do they do so?
1. You copy their code. They bring a copyright claim (let's assume this isn't a DMCA thing and they're actually bringing a claim to court). Your defence is "the LLM wrote it so no copyright attaches". Since they're asserting their copyright claim, they would have to provide evidence for that claim (same as in any other copyright case), including providing evidence that a human wrote it (which is new, and required to defeat your defence).
2. They copy your code. You bring a copyright case. Their defence is "I used an LLM to wash the code without copying". Since they're not disputing your copyright claim to the original code, you don't have to defend or prove your copyright. But you do have to prove that their code infringes on your copyright, which would mean proving that the LLM copied your code when creating the new code. This has been done before by demonstrating similarity.
> What makes the leak illegal other than copyright? The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.