| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ergonaught 57 days ago
	The GPL, unlike the BSD and such, intends to prevent the closing of distributed derivative works. LLMs trained on GPL code can produce derivative works without any enforcement mechanism. You may be fine with that, but the GPL is not a public domain license, and LLM training treats all things as if they were public domain.

1 comments

Rochus 57 days ago

> LLMs trained on GPL code can produce derivative works

This confuses two completely separate things. GPL governs distribution of derivative works. An LLM trained on GPL code does not distribute that code. The model weights are not a copy, a derivative, or a distribution of the training data in any legally recognizable sense; "influenced by" is not "derived from". The enforcement argument is a non sequitur; the GPL has never had a technical enforcement mechanism; it's always been legally enforced after the fact by copyright holders who discover violations. So if the LLM would indeed produce output sufficiently similar to my code and someone would publish it in violation of GPL, I have the same legal means to enforce my rights as if the code was copied by a human.

link

vitally3643 57 days ago

> An LLM trained on GPL code does not distribute that code.

You can't simply make that assertion. You'll have to prove that LLMs do not actually contain encoded copies of copyrighted code and that they are incapable of reproducing such code verbatim.

There is no evidence for such a claim, and so your entire argument is completely baseless.

link

Rochus 57 days ago

> You'll have to prove that LLMs do not actually contain encoded copies

In law, the presumption is that an act is lawful unless proven otherwise. The burden lies on whoever claims a violation occurred. I already went into the case of sufficiently similar reproduction in my previous response.

link

LtWorf 57 days ago

I mean… it's been common knowledge for a while that they do in fact contain the original data.

https://www.reddit.com/r/programming/comments/oc9qj1/copilot...

You can disagree all you want, but there's ample evidence of this.

link