Hacker News new | ask | show | jobs
by scuff3d 104 days ago
The solution to this whole situation seems pretty simple to me. LLMs were trained on a giant mix of code, and it's impossible to disentangle it, but a not insignificant portion of their capabilities comes from GPL licenced code. Therefore, any codebase that uses LLM code is now GPL. You have a proprietary product? Not anymore.

Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.

4 comments

What about the code that wasn't even GPL, but "all rights reserved", i.e., without any license? That's even stronger than GPL and based on your reasoning, this would mean that any code created by an LLM is not licensed to be used for anything.
Code created by an LLM cannot, in the USA, be copyrighted. No copyright, no license.
You get it wrong. Copyright is excluding you from using something, a license is allowing you to use something. So „no license“ does NOT mean „free to use“, but „not allowed to use“.
If you do not hold copyright, you cannot prevent someone from copying a thing. If you cannot prevent someone from copying the thing, then "licensing" it is somewhere between pretty weird and pretty stupid, no?
No, because OP implied that the AI generated content inherits the LICENSE: in their view, if the input has been GPL, The output must be GPL. So if the input hasn’t been licensed at all, the output cannot be licensed. The inheritance of „no license“ is not „no copyright“, but „no license“. The question of copyright applies hasn’t been definitely answered yet, but just because it is likely that the person PROMPTING the AI doesn’t gain copyright, doesn’t mean that an output that is 1:1 derived from copyrighted material loses its copyrighted status. That would be truly ridiculous.
As you note, this is a legal question that has not yet been answered. I think that speculating on the outcome in the current legal climate is fruitless.
Okay. That's fine with me. I was trying to be generous and assume the GPL would be the strongest.
That would make sense, yes.
Yes.
US courts have already ruled that in the USA, no machine-generated code can be copyrighted. No copyright, no license, of any type.
if you train yourself by looking at GPL code then go implement your own things, is that code GPL?
it can be, depending on if it is different enough to convince a jury that it is not a copyright violation. See the lawsuits from Marvin Gaye's family to see how that can be unpredictable.
I would imagine there must also be some aspect of uniqueness to it as well for even recognizing where a line of code came from… otherwise almost every Python script might have copied this line from a GPL licensed program:

`if __name__ == "__main__":`

I have no idea where that line first appeared, so figuring out what license it was originally written under would be difficult to track down, and most software only has license info at the file rather than line level.

If you copy and paste one line from a thousand different GPL projects, is the resulting program GPL?

Let's be honest about what's happening here.

It could be. The amount of code you copy doesn't matter, just depends on context and if your work could now be considered derivative.

I said this else where, but I work with people who won't even look at GPL code because of the potential legal entanglements.

Yes let's. Corporations with billions of dollars behind them whole sale stole copy right work and licenced code to train models, and then turned around and sold the result with no attribution or monetary benefit given to the people they stole from. They knew what they were doing and relied on the legal system being slow enough that they could plant a flag in the market before legal challenges killed them.

It's an industry built on theft. By all rights they should have been sued/fined out of existence before it ever got this far. But if you have enough money you can make almost anything legal.

I work with people who literally won't even look at GPL code, because of the risk. So yes, potentially.
Of course not, because everyone making these arguments wants people to have some magic sauce so they get to ignore all the rules placed on the "artificial" thing.
If you genuinely believe that you are not above a literal text completion algorithm and do not deserve any more rights than it, that says more about you than anything else.
100% agree, if we are fair and honorable.

In practice, well ... you saw what's been going on with the Epstein files, etc... we are far from being ourselves in a world that's fair and honorable.

(I'm not condoning it, I think it's massively trashy to steal code like this then pretend you're the good guy because of some super weird mental gymnastics you're doing)

Completely agree. This isn't practical. It's never going to happen just because of the sheer amount of capital behind LLM companies.

You can do anything rotten, as long as you throw enough money at it.