Hacker News new | ask | show | jobs
by xigoi 21 days ago
> FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

Not true. Most FOSS licenses require attribution and many require derivatives to be released under the same license.

1 comments

Sure, but I guess I'm not seeing the relevance here. Are we seeing some greater-than-normal wave of people redistributing FOSS code without attribution, or creating derivative works without adhering to the license terms? LLM training doesn't seem to be either of these things.
We are seeing megacorporations (SlopenAI, Antslopic, Microslop, etc.) distributing derivatives of open-source code (their LLMs) without attribution.
Can you point to some specific examples of products shipped by the companies I assume you're referring to here that are in fact unattributed derivative works of GPL-licensed software?

Or are you saying that you think anything generated by an LLM qualifies as a derivative work of anything included in its training data?

The latter.

It's a tool, if using data is necessary to make the tool work, then its output derives from the data.

If the LLM generation is not derivative of its training data, then why would it need the training data in the first place?

> It's a tool, if using data is necessary to make the tool work, then its output derives from the data.

That's simply not correct within the applicable meaning of "derives" as understood in copyright law. In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not.

Even creating works that merely draw on a single source of data, but express the ideas drawn from that in a new or transformative way, are not considered derivative works (see the ruling in Google v. Oracle, for example), let alone works based on patterns extrapolated by relating together ideas sourced from many distinct works, which is what LLMs are principally doing.

If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case.

> That's simply not correct within the applicable meaning of "derives" as understood in copyright law.

Would be rather hard to write a definition that handles it properly back when LLMs didn't exist; not that laws particularly have anything to do with intent/desires behind FOSS anyway - intent is clearly there: you get code, under the condition that if you use it for anything, I get credited; else, you get nothing.

> In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not.

Luckily, FOSS is specific published works, and unless LLMs actually reasonably-provably do such decomposing into ideas/facts (good luck reasoning about that), that part is also irrelevant.

> If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case.

Depending on intent, that very much can happen, it's called plagiarism. Good luck proving an LLMs intent. (not to mention the obvious differentiating factor of LLMs having arbitrarily-good memory unlike humans)