Hacker News new | ask | show | jobs
by pixel8account 935 days ago
I don't want to halt AI training, I want corporations to fuck off from using my (A)GPL code to train their proprietary models which they then sell to people writing more proprietary code. I would be ok if the derived code is properly GPL licenced too.

I suspect many people feel in a similar way too (for example, artists whose art is used to train image generators without compensation).

3 comments

I agree with this, but is this something that should be dealt with in the law or in the license? My gut feeling is that the just remedy is A) the GenAI models out there should get to do what they want as long as they are not violating licenses, B) the libre software world needs to hustle and release new versions of the appropriate licenses that specifically forbid use of the source code to train AI unless the AI itself is licensed permissively.

Note that in regards to A) I'm pretty sure the AI firms ARE violating copyright today, they have done this knowingly, and they should get a hard slap for that. But they are not violating any particular copyleft licensing provisions to my knowledge

> I'm pretty sure the AI firms ARE violating copyright today, they have done this knowingly, and they should get a hard slap for that.

Depends on the jurisdiction. Do note that many countries already passed laws indicating that training is NOT copyright infringement. The EU[1], for example. In which case, no license would matter.

https://www.reedsmith.com/en/perspectives/ai-in-entertainmen...

[1] Yes, in the EU, you can opt-out (but only for commercial purposes). In other countries such as Singapore however, there are no legal mechanisms for opting out.

> I suspect many people feel in a similar way too (for example, artists whose art is used to train image generators without compensation).

Just to be a counter-voice, I don't. My code is AGPL too, but since the number of copy"righted" things outnumber the number of AGPL things, I'd rather anybody have the ability to train their own AI on all material. Conversely, if it was considered a derivative, only large corporations like Adobe or Microsoft would be able to train on it (e.g. they can just give themselves a license to do so via the ToS).

In other words, it's probably a bad idea to strengthen copyright law for the purpose of enforcing copyleft, due to possibilities of it backfiring on us.

What about people reading your code and learning from it before implementing their own code? What's the similarity level where that becomes a problem for you, if their code is closed-source or uses a license you disagree with?
Putting aside the fact that what we call AI today is not learning in the same way as humans. They operate on a VASTLY different scale compared to humans. On a good week I can read a book. A single book. A massively parallelised data centre can do that billions or trillions of times faster. Scale of effect (lacking a better phrase) must be considered.

a rack of equipment does not need to sleep, eat, take care of themselves, earn a living and so on while churning through millions of words a minute. An actual thinking and learning person has to choose what to spend their limited time and money and attention on, while reading at a pace of dozens of words a minute. Those are not the same things at all.

More than that, AI-generated work is clearly derivative for the simple obvious reason that none of it would exist without the original sources.
I mean, by that logic every fantasy story ever is a derivative work. Should everyone be paying JRR Tolkein's estate royalties the moment they include elves in a story?
That's... not the legal definition of a derivative work.