Hacker News new | ask | show | jobs
by Quarrel 333 days ago
If there is barely any code in those repos that you wrote, how can you license them under the GPL? You don't hold the copyright for it.

This genuinely isn't an attack, I just don't think you can? The AI isn't granted copyright over what it produces.

7 comments

I can only talk about the law in England & Wales, but:

For code generated by an LLM the human user would likely be considered the author if you provided sufficient creative input, direction, or modification.

The level of human involvement matters, simply prompting "write me a function" might not be enough, but providing detailed specifications, reviewing, and modifying the output would strengthen the claim.

the Copyright, Designs and Patents Act 1988 (CDPA), Section 9(3) staes, "In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken". This was written before LLM's existed, but recent academic literature has supported this position, https://academic.oup.com/jiplp/article/19/1/43/7485196?login...

However, a comparable situation was tested with Thaler v Comptroller-General, where courts emphasised that legal rights require meaningful human involvement, not just ownership of the AI system. - https://www.culawreview.org/journal/unlocking-the-canvas-a-l... and https://www.whitecase.com/insight-our-thinking/uk-supreme-co...

I do acknowledge there is uncertainty, and this is highlighted here in "The Curious Case of Computer-Generated Works under the Copyright, Designs and Patents Act 1988.", with "section 9(3): the section is either unnecessary or unjustifiably extends legal protection to a class of works which belong in the public domain" - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4072004

Today, I think it's doubtful that a functional application can be entirely vibe coded without decent direction and modification, but I don't think that will always be the case.

At least for art there is already precedent in US courts with someone trying to copyright an image generated by midjourney and it getting revoked in 22, because ai generated art cannot be copyrighted.

for code it hasn't been challenged yet, but I find it doubtful they'd decide differently there

I was reading Doe 1 v. GitHub for my paper. The case involves open source developers suing Github Copilot which were trained on, and generating open source code including code with MIT and AGPL license.

So far, the judge believe that training models on open source code is not a license violation as the code is public for anyone to read, but by "distribution or redistribution" (I assume, of the model's outputs?) it is still up for the court's decision whether that violate the terms of the license, among other laws.

The case is currently moved to Ninth Circuit without a decision in the district court, as there are other similar cases (such as Authors Guild's) and they wanted that the courts would offer a consistent rules. I believe one of the big delay in the case is in damages, which I think the plaintiff tried to ask for details of Microsoft's valuation of GitHub when it was acquired, as GitHub's biggest asset is the Git repositories and may provide a monetary value of how much each project is worth. Microsoft is trying to stall and not reveal this.

Assuming you're referring to Thaler v. Perlmutter, Thaler claimed to the copyright office that the image at issue was "autonomously created by a computer algorithm running on a machine". So the question of "if you claim the LLM did it itself" is settled (shocker, cf. Naruto v. Slater, 888 F.3d 418), but that definitely did not settle "_I_ used the LLM to do it".
Tbf, IANAL and was only repeating what journalists wrote back then. Ultimately, I have no deeper knowledge of the laws in question and thus don't have a qualified opinion on the matter.
Also, if there isn't enough human involvement for the code to be copyrightable, then its basically equivalent to being in the public domain. This is more permissive than any code license (ie GPL), so should be fine no matter what.
I'm not so sure about that.

The legal standards in the United States for software copyrights are Jaslow and Altai, known to Federal courts as SSO [0] and AFC [1], respectively.

These standards consider the overall structure of code as being copyrightable. This means that you can't just rename a bunch of variables and class names. The overall organization of the code is considered an arbitrary expression. Someone would be infringing on copyright if they took your Java code and converted it to Python with different class, variable and function names but kept the same relationships between classes and the same general structure.

So what does this have to do with LLMs? Well, if the author directed the code to be structured in a certain way, directed to create specific APIs, etc, then there is a legal argument that the author has at least copyright over the arbitrary and expressive decisions that were made while building a software system.

[0] https://en.wikipedia.org/wiki/Structure,_sequence_and_organi...

[1] https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...

this is highly speculative IANL
I have not thought about it, it is one of the things on my list. But my understanding was that developers copy code from Stack Overflow, as an example. It is not "my" code but I still am the author. Or lets say I ask my friend to add code and she/he simply passes over the code to me. I author it in my name.

The "barely" part may be important and I would like to know what others are doing.

I don't think you can just willy-nilly copy code from StackOverflow and sign it with your name. It's license forbids it. You also can't just sign your friend's code with your name unless she explicitly gives you permission. In both cases you are not the author of that code.

I get that people do it anyway but I guess it's kind of a grey-area because it's hard to tell after the fact that some snippet has been copied from SO.

I got a patch rejected (rightfully so IMO) a long-time ago from libvirt (RedHat) because I was using (and mentioning) code taken from StackOverflow.
So what would be the status of this code? Nobody holds the copyright for it, so anyone can use it in any way and nobody can sue for anything. It's not GPL, but it sounds pretty open source to me.
Yes, my understanding is that non-humans in the USA, cannot be granted copyright. This puts the work in the public domain, which means it can't be relicensed.

There was a much appealed case of a monkey taking a photo, where it was decided the photo was in the public domain.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

It boiled down to the creator not being a "legal person" and so could not hold copyright.

The real problem for software is where the line is for a "sufficient" transformation from the source material by a human to make it acquire copyright. You can write a Dickens' character derived novel and have copyright in it, but not gain control over those characters as Dickens described them.

Can you buy Jules Verns book, add comments and claim copyright on the whole book?

Claim partial copyright without specifying clearly what exactly?

Absolutely.

People sell annotated Bibles, or Shakespeare etc. You can transform it in to something that can acquire copyright, but it must have an artistic step.

This is a big thing in the fine art world as well, you can take inspiration, you can in some circumstances outright copy, but then you need to transform it sufficiently that it becomes your own art. People argue in front of judges about this stuff, of course.

Verne is a good example too, because if you print an English version, the translator acquires copyright in the translated version.

IANAL but the AI is a tool and presumably the code should be treated as any other auto-generated. Its the product of the tool user.

Unless the product includes code licensed by others, then - like any other repo - I don't see any license issue here.

If you mean there is no insight as to whether licensed code is included, that's one of the constraints of vibe-coding (which people often confuse with AI-assisted coding).

Its the job of the user to check and curate the contributions as they would any third-party human input (eg. via prs). Again though - that's not an AI coding issue, but a human process decision.

No, there is no copyright.

If you tried to sue someone for copyright infringement based on code that an LLM generated for you, you'd be laughed out of court.

But you were the one that used the LLM to generate it, so that’s your code, surely - how would unlicensed use of your code not violate copyright? Why didn’t they ‘just’ use an LLM to generate their own code?
The product is not owned by the tool user.

Use a hammer, you own the output. Use an intern, the intern does.

Of course if you're aren't a person you can't own anything.

Correct. No copyright. No legal teeth to the GPL.

Take whatever you want and relicense it cause it doesn't belong to the "author"

Lolololololol "author"

What else should they do then? GPL is a good call. Most keep the output under their own personal/company IP.
> What else should they do then?

They can say the code is in the public domain.

This is distinct from open source, yes, but in almost all cases less restricted than anything with a (open source or otherwise) license.