Hacker News new | ask | show | jobs
by GuB-42 172 days ago
If training AI is a copyright exemption, and it is likely to be the case, then the license is irrelevant.

If it is not then the trained AI is a derivative work, which the license should allow as long as it is publishable under the same license to be considered open source or free software.

In any case, I don't think an anti-AI clause would serve a meaningful purpose on open source software. You can however make your own "source available" license that explicitly prevents its use on AI training, and I am sure that some of them exist, but I don't think it will do much good, as it is likely to be unenforceable (because of copyright exemptions) and will make it incompatible with many things open source.

1 comments

Laws cannot be changed retroactively. So if AI training is a copyright extension that can only happen starting sometime next year. So the consequences of these companies' choices are already set in stone, even if they're not known yet.

The GPL requires that all materials to reproduce any derivative work be made available at cost (and all models can reproduce linux kernel GPL data structures, including the private parts, character-by-character). So do I get access to OpenAI's full training data?

Or do I get to make and publish Mickey Mouse cartoons by training an AI on Disney movies then publishing the model output. Hell, I could even make better versions of old Disney movies, competing with half of Disney's current projects!

It seems to me one of these must be true. So which is it?

Um, no. Copyright puts specific restrictions on what you can do with work. Those restrictions are described by certain words. The question is whether the existing restrictions cover training AI. That's a matter of interpretation, but once an interpretation is accepted, it is understood as what copyright always meant.

Training AI is probably not a copyright violation because it never was one to begin with.

The comments of the (German) judge in this case seem to indicate the judge doesn't understand why any of the defendants even thought training AI wasn't a violation (at least not when taken to the point it can exactly reproduce and create derivative works to existing works. Maybe that's why OpenAI is trying to make that harder now. Still trivial to make it violate that rule though).

https://www.dw.com/en/openai-loses-song-lyrics-copyright-cas...

Note that OpenAI has now testified that they indeed used copyrighted works to train their models. The outcome of the case is that both training AI models using copyrighted work and providing AI model outputs that are derivative of some copyrighted work are copyright violations, and would mean model owners have to respect licenses (ie. compensating the authors)

The case can still be appealed, so it is not final. On the other hand, if I'm reading WTO copyright treaty rules correctly, this ruling applies in the US.

In the US things seem to be going in a similar direction: https://www.publishersweekly.com/pw/by-topic/digital/copyrig...

Seems to me this can still easily go the way the authors want it to in the US. And in theory, it doesn't even have to, OpenAI lost. Yes, it can be fought on appeal, but I've always heard that winning an appeal after losing a case is 10x harder than winning that case in the first place. And we'll know in early January if OpenAI fights it at all, so it's not like they have a lot of time left.