Hacker News new | ask | show | jobs
by bjourne 16 days ago
People taking your work and not giving anything back was ALWAYS the risk you took when writing free software. LLM training doesn't change that much. That the us military no doubt is using gcc to compile embedded software for their icbm:s no doubt irks the gnu people. But you can't have it any other way. "You can only use my software for good things" just is not consistent with "free software".
3 comments

Yeah, I really can't comprehend these sentiments as anything other than an "I don't like AI" argument. FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

I see a lot of risks involved in people surrendering their own decision-making to LLMs, but that's a question of how they're used, not how they're trained. The idea that using FOSS software to train LLMs is somehow a violation of FOSS norms just doesn't seem valid.

That's just the licensing part. The license says something, but a license doesn't turn people into slaves. The desire or decision to produce software has to come first and only then does code with a license exist.

Before AI and in the early days of FOSS, people assumed that the primary recipient of code sharing were other FOSS enthusiasts, in the form of developers and users.

Then there was a wave of permissive licensing, which obviously brought with it corporate interests, however, this was easily foreseeable and many people who favored permissive licensing intentionally did so to appeal to corporate users, so the risk of them quitting due to perceived abuse was slim.

Now that LLMs are a thing, the primary recipient of a lone developer working on his project isn't really another human being. This human connection is now lost. Instead, your project is now laundered through the model and the model vendor can get away with ignoring your terms and conditions and let others write proprietary software.

In this transition period there were developers who thought that there was always going to be a human connection (even if part of a corporation), but then things changed and they realized their world view was wrong. Given the arrival of this new information, they obviously change their behavior in accordance to how the world actually is.

> FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

That is wrong. How can you write that with a straight face? There are projects that are put into the public domain (one major one comes to mind), but the clear majority of FOSS projects have strings attached which make the intention of the authors absolutely clear.

IOW, if you're not happy with what the cost of the product is, then just don't use it.

I mean, the most restrictive license, the GPL, was conceived specifically to protect the "four freedoms" and prevent subsequent modifications from violating them. The "copyleft" concept was specifically designed to create an ecosystem that behaved as if copyright didn't apply in the first place.

I don't know how you can imply with a straight face that it did anything else.

I don't know how you can possibly argue that non-redistributive usage of software could ever violate the GPL -- and the other common FOSS licenses don't even have the copyleft provision, and literally are saying "do whatever you want, but I'm not responsible".

> that behaved as if copyright didn't apply in the first place.

If copyright didn't exist then the share-alike and anti-tivoization clauses wouldn't work, FOSS in general wouldn't even protect attribution. Copyleft ecosystems depend on some amount of copyright law to uphold themselves.

> The "copyleft" concept was specifically designed to create an ecosystem that behaved as if copyright didn't apply in the first place.

And if copyright didn't exist in the first place we wouldn't be having this conversation, because the models created by all the token providers will be open to all for whatever use that anyone wanted.

But it does exist, and within this framework, the creator gets to say how you may redistribute their IP, and "We compressed it very much" isn't an out.

> But it does exist, and within this framework, the creator gets to say how you may redistribute their IP,

Right. And the way the creator gets to exercise that say is by releasing their work under a license. If you release your work under a FOSS license, you're saying "you are free to copy this work and use it for your own purposes".

Complaining that people are using it for purposes you don't like after you've already given permission to them to use it for whatever purposes they please seems a bit disingenuous.

> and "We compressed it very much" isn't an out.

It's not, but I don't think we're discussing that. We're talking about LLMs, not people redistributing zip files containing someone else's work. If you're trying to imply that LLMs are merely a form of compression, that's a position you've got to argue for, because I'm definitely not seeing any similarity between the two.

> FOSS has always been about just writing code and putting it out into the world where others can do as they please with it.

Not true. Most FOSS licenses require attribution and many require derivatives to be released under the same license.

Sure, but I guess I'm not seeing the relevance here. Are we seeing some greater-than-normal wave of people redistributing FOSS code without attribution, or creating derivative works without adhering to the license terms? LLM training doesn't seem to be either of these things.
We are seeing megacorporations (SlopenAI, Antslopic, Microslop, etc.) distributing derivatives of open-source code (their LLMs) without attribution.
Can you point to some specific examples of products shipped by the companies I assume you're referring to here that are in fact unattributed derivative works of GPL-licensed software?

Or are you saying that you think anything generated by an LLM qualifies as a derivative work of anything included in its training data?

The latter.

It's a tool, if using data is necessary to make the tool work, then its output derives from the data.

If the LLM generation is not derivative of its training data, then why would it need the training data in the first place?

There's an almost intergalactic level of irony in the extent to which open source has benefited giant corporations and the military at the expense of individuals, and ultimately contributed to the commercialised enclosure of software IP.

I suppose you could argue it also indirectly led to the empowerment of non-developers to create their own vibe coded solutions. But we're not quite there yet.

And the AI IP that makes that possible is still enclosed rather than open.

Sure, Free Software hasn't been the vehicle for societal change that RMS and others certainly hoped. I remember being flamed out in a user group for suggesting that our conference shouldn't be held in a "non-free" country such as Morocco, Turkey, or China because it's counter-productive to freedom. Very few people actually got it. But it's orthogonal to LLM trainers also using free software in "non-approved" ways.
> There's an almost intergalactic level of irony in the extent to which open source has benefited giant corporations and the military at the expense of individuals, and ultimately contributed to the commercialised enclosure of software IP.

Could you perhaps explain that irony a bit more explicitly?

Can you provide any examples of "commercialized enclosure of software IP" somehow backwashing into the FOSS ecosystem and closing things up that are already open?

Don't open-weight models sort of returning the favor?
> it also indirectly led to the empowerment of non-developers to create their own vibe coded solutions.

Nobody is empowered to do that because the models to do that aren't free.

> But we're not quite there yet.

Judging from the number of projects I've seen from people who aren't software developers, we're there enough.

Before LLMs, you could use the GNU GPL or other copyleft licenses to protect your code from being used to develop non-free software. Unfortunately, the courts have decided that LLMs are free to ignore licenses.
Copyleft is about republishing. You can't prevent anyone from using your compiler or text editor to develop non-free software.