Hacker News new | ask | show | jobs
by jupp0r 1148 days ago
What happens if I (a human) read GPL code and then reuse the knowledge gained from it in my own commercial projects? It's not as clear cut as you make it sound.
2 comments

It could be as clear-cut as you've just made it: "a human". An LLM is not a human.

You could get into the semantics of "learning" - does JPEG encoding count as the computer "learning" how to reproduce the original image? But trying to create some metric for why LLMs "learn" and JPEG doesn't "learn" on the basis of the algorithms is a philosophical endeavor. Copyright is more about practicality - about realized externalities - than it is about philosophy. That's why selling cars and selling guns are regulated differently, despite the fact that you could reduce both to "metal mechanical machines that kill" by rhetorical argument.

Even from a strictly legal perspective, it actually is fairly clear-cut. The answer to "what if I (a human) read GPL code and then reuse the knowledge gained from it..." comes down to a few straightforward properties of the license. GPL doesn't cover "reduced to practice" as many corporate contracts do, so terms covering "the knowledge gained" are lenient. GPL covers "verbatim" copies which is what LLMs are doing, that's as clear cut as it gets. Inb4: "So what if I add a few spaces here and there?" - well, GPL also covers "a work based on"; this is where I (who am not a lawyer) can't speak confidently, but surely there are legal differences between "based on" and "reduced to practice", considering that both are very common occurrences in contracts, so there actually would be a lot of precedent.

I agree with you that verbatim copies are obviously covered by copyright. What if LLMS reproduce code with changed variable and function names (which would be a great improvement to `cs_gaxpy` in the original article)? What if just the general structure of an algorithm is used? What if the LLM translates the C algorithm from the original article into Rust? This discussion is only scratching the surface.
Copyright. Copyright. That is the issue. If you reproduce the code verbatim then you are in violation. This is what the AI is doing.

Just learning from the GPL code to make yourself smarter is not the problem.

It's going to be an uphill battle just to get people to even understand what the problems are. And this is even a technical forum. Now imagine trying to explain these nuances to a judge or jury.
It's not so much an ability to understand as it is a desire to not understand in order to be able to ignore the rightsholders' licensing terms.

Plenty of tech companies exist by putting a thin layer on top of the hard work of others and if those others can be ignored then that's what they'll do.

The example given in the article isn't verbatim.