| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lurk2 108 days ago
	What if you used the LLM to generate works that were already copyrighted?

3 comments

AnthonyMouse 107 days ago

There was a recent case that everyone has been describing as "LLM output can't be copyrighted" but what it actually said was you can't register the AI as the author.

link

marcus_holmes 107 days ago

This is not true, and I'd love to see some actual citation here.

The courts have repeatedly said that copyright only applies to human creativity. The Supreme Court explicitly said this when they refused to hear the appeal:

https://en.wikisource.org/wiki/Thaler_v._Perlmutter,_Refusal...

> "We affirm our decision to refuse registration for the Work because it lacks the human authorship necessary to be eligible for copyright protection."

So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.

The related case about patents is more supportive of the narrative that AIs cannot be authors (see https://www.cafc.uscourts.gov/opinions-orders/21-2347.OPINIO...), specifically: "Here, there is no ambiguity: the Patent Act requires that inventors must be natural persons; that is, human beings."

The patent situation is that the Act says that inventor must be an individual, which the courts are interpreting to mean a human, so the LLM cannot be named as the inventor. So, in this case, yes, this is just saying that an LLM cannot be named as the inventor of a patent. That's not the same thing as the courts are saying with copyrights.

link

AnthonyMouse 106 days ago

> So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.

They're saying that the LLM can't be the author.

Now suppose you supply the LLM with a prompt that contains human creativity, it performs a deterministic mathematical transformation on the prompt to produce a derivative text, and you want to copyright that, claiming yourself as the author. What happens then?

If you think the answer is that you can't, how do you distinguish that from what happens when someone writes source code and has a compiler turn it into a binary computer program? Or do you think that e.g. Windows binaries can't be copyrighted because they were compiled by a machine?

link

ranger_danger 100 days ago

> Now suppose you supply the LLM with a prompt

My understanding was that they did in fact do just that, but the court somehow misunderstood what they were doing, and assumed that the LLM was working completely autonomously without any human input at all, which isn't really possible IMO. Someone told it what to do.

They also argued that you couldn't copyright an output that you can't explain how it came to be, i.e. if they had been able to articulate how an LLM works, the outcome might have been quite different, which I found surprising.

If art in general (human-made or otherwise) is always derived from existing influences... should we really be forced to explain how or why we created a piece of art in order to defend it?

The usual bar for copyright infringement of a derivative work is, from what I have seen, "how much did you copy from the original, and how obvious is it", which is of course a subjective determination that would be made by each individual judge or jury of a case.

link

marcus_holmes 106 days ago

> What happens then?

The part that the human created, the prompt, can be copyrighted.

The part that the LLM created, cannot be.

Copyright in code works exactly the same way: the source code is copyrighted. The binary code is only copyrighted to the extent that it is derived from the source code. This is well-established.

link

ranger_danger 100 days ago

Maybe I am just misunderstanding something, but I feel like you might be contradicting yourself here... why can LLM output not be copyrighted, but compiler output can be?

link

marcus_holmes 100 days ago

No, that's the point - the compiler output is only copyrighted to the extent that it is derived from the source code. The compiler itself cannot create anything copyrightable, but because there is a deterministic link between the source code and the binary code, and the source code was the product of a human, the binary code is covered by the source code copyright.

It's like a photocopier. If you photocopy a page from a book, that page is still covered by the copyright of the book author, even if the page is 2x larger or otherwise transformed by the machine.

link

bdowling 107 days ago

Powerful interests want it to be true.

link

dataflow 108 days ago

IMO the bigger question is how would you even tell if a work was generated by an LLM? There's a ton of code being written out there; the folks who generated it are going to claim they authored it for copyright purposes, and those who want to use it are going to claim it was LLM-generated. So what happens?

link

greyface- 107 days ago

The alleged author, when bringing a copyright infringement suit, will submit testimony claiming they wrote it. Parties to the suit will have a chance to present arguments and evidence. Then, the claim will be adjudicated by a judge and/or jury.

link

terminalshort 107 days ago

That code isn't going to be open source. And if you use someone else's closed source code you are violating laws that have nothing to do with copyright.

link

dataflow 107 days ago

I'm not sure I understand. I'm not talking about stolen/leaked code here. I'm saying: imagine you claim you're the author of some piece of code. You may or may not have written it with an LLM, but even if so, assume you have the full rights to all the inputs. You post it publicly on GitHub. You don't attach a license, or perhaps you attach a restrictive license that doesn't permit much beyond viewing. Someone comes across your code, finds it brilliant, and wants to use it. If that code was non-copyrightable (such as generated via an LLM), then they're fine doing it without your permission, no? But if that code was copyrightable, then they're not permitted to do so, correct?

So now consider two questions:

1. You actually didn't use an LLM, but they believe & claim you did. Who has the burden of proof to show that you actually own the copyright, and how do they do so?

2. They write new code that you feel is based on yours. They claim they washed it through an LLM, but you don't believe so. Who has the burden of proof here and how do they do so?

link

marcus_holmes 107 days ago

Good questions.

My take on the answers (I am not a lawyer):

1. You copy their code. They bring a copyright claim (let's assume this isn't a DMCA thing and they're actually bringing a claim to court). Your defence is "the LLM wrote it so no copyright attaches". Since they're asserting their copyright claim, they would have to provide evidence for that claim (same as in any other copyright case), including providing evidence that a human wrote it (which is new, and required to defeat your defence).

2. They copy your code. You bring a copyright case. Their defence is "I used an LLM to wash the code without copying". Since they're not disputing your copyright claim to the original code, you don't have to defend or prove your copyright. But you do have to prove that their code infringes on your copyright, which would mean proving that the LLM copied your code when creating the new code. This has been done before by demonstrating similarity.

link

marcus_holmes 107 days ago

Can you expand on that, please? Which other laws are infringed if you use someone else's closed source code?

link

LtWorf 107 days ago

You used an illegal leak to train your llm

link

Dylan16807 107 days ago

What makes the leak illegal other than copyright?

The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.

link

dataflow 107 days ago

> What makes the leak illegal other than copyright? The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.

I think 18 U.S.C. § 1832 (a) (3) might answer your question? https://www.law.cornell.edu/uscode/text/18/1832

link

wk_end 107 days ago

Is Pierre Menard really the author of his Quixote?

link