Hacker News new | ask | show | jobs
by Gormo 10 days ago
Yeah, I really can't comprehend these sentiments as anything other than an "I don't like AI" argument. FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

I see a lot of risks involved in people surrendering their own decision-making to LLMs, but that's a question of how they're used, not how they're trained. The idea that using FOSS software to train LLMs is somehow a violation of FOSS norms just doesn't seem valid.

3 comments

That's just the licensing part. The license says something, but a license doesn't turn people into slaves. The desire or decision to produce software has to come first and only then does code with a license exist.

Before AI and in the early days of FOSS, people assumed that the primary recipient of code sharing were other FOSS enthusiasts, in the form of developers and users.

Then there was a wave of permissive licensing, which obviously brought with it corporate interests, however, this was easily foreseeable and many people who favored permissive licensing intentionally did so to appeal to corporate users, so the risk of them quitting due to perceived abuse was slim.

Now that LLMs are a thing, the primary recipient of a lone developer working on his project isn't really another human being. This human connection is now lost. Instead, your project is now laundered through the model and the model vendor can get away with ignoring your terms and conditions and let others write proprietary software.

In this transition period there were developers who thought that there was always going to be a human connection (even if part of a corporation), but then things changed and they realized their world view was wrong. Given the arrival of this new information, they obviously change their behavior in accordance to how the world actually is.

> FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

That is wrong. How can you write that with a straight face? There are projects that are put into the public domain (one major one comes to mind), but the clear majority of FOSS projects have strings attached which make the intention of the authors absolutely clear.

IOW, if you're not happy with what the cost of the product is, then just don't use it.

I mean, the most restrictive license, the GPL, was conceived specifically to protect the "four freedoms" and prevent subsequent modifications from violating them. The "copyleft" concept was specifically designed to create an ecosystem that behaved as if copyright didn't apply in the first place.

I don't know how you can imply with a straight face that it did anything else.

I don't know how you can possibly argue that non-redistributive usage of software could ever violate the GPL -- and the other common FOSS licenses don't even have the copyleft provision, and literally are saying "do whatever you want, but I'm not responsible".

> that behaved as if copyright didn't apply in the first place.

If copyright didn't exist then the share-alike and anti-tivoization clauses wouldn't work, FOSS in general wouldn't even protect attribution. Copyleft ecosystems depend on some amount of copyright law to uphold themselves.

> The "copyleft" concept was specifically designed to create an ecosystem that behaved as if copyright didn't apply in the first place.

And if copyright didn't exist in the first place we wouldn't be having this conversation, because the models created by all the token providers will be open to all for whatever use that anyone wanted.

But it does exist, and within this framework, the creator gets to say how you may redistribute their IP, and "We compressed it very much" isn't an out.

> But it does exist, and within this framework, the creator gets to say how you may redistribute their IP,

Right. And the way the creator gets to exercise that say is by releasing their work under a license. If you release your work under a FOSS license, you're saying "you are free to copy this work and use it for your own purposes".

Complaining that people are using it for purposes you don't like after you've already given permission to them to use it for whatever purposes they please seems a bit disingenuous.

> and "We compressed it very much" isn't an out.

It's not, but I don't think we're discussing that. We're talking about LLMs, not people redistributing zip files containing someone else's work. If you're trying to imply that LLMs are merely a form of compression, that's a position you've got to argue for, because I'm definitely not seeing any similarity between the two.

> FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

Not true. Most FOSS licenses require attribution and many require derivatives to be released under the same license.

Sure, but I guess I'm not seeing the relevance here. Are we seeing some greater-than-normal wave of people redistributing FOSS code without attribution, or creating derivative works without adhering to the license terms? LLM training doesn't seem to be either of these things.
We are seeing megacorporations (SlopenAI, Antslopic, Microslop, etc.) distributing derivatives of open-source code (their LLMs) without attribution.
Can you point to some specific examples of products shipped by the companies I assume you're referring to here that are in fact unattributed derivative works of GPL-licensed software?

Or are you saying that you think anything generated by an LLM qualifies as a derivative work of anything included in its training data?

The latter.

It's a tool, if using data is necessary to make the tool work, then its output derives from the data.

If the LLM generation is not derivative of its training data, then why would it need the training data in the first place?

> It's a tool, if using data is necessary to make the tool work, then its output derives from the data.

That's simply not correct within the applicable meaning of "derives" as understood in copyright law. In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not.

Even creating works that merely draw on a single source of data, but express the ideas drawn from that in a new or transformative way, are not considered derivative works (see the ruling in Google v. Oracle, for example), let alone works based on patterns extrapolated by relating together ideas sourced from many distinct works, which is what LLMs are principally doing.

If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case.