Not necessarily. If you do it right, you've got a perfectly GPL-compatible license (because such laundering is, technically, a violation of the GPL… probably) – it's just a license that's more explicit about what's a license violation.
GPL explicitly forbids re-licensing under more restrictive terms.
So either the added terms are not more restrictive, which basically means they are unnecessary and have no real effect; or they are more restrictive, which is incompatible with the GPL.
You can't have things go both ways. It seems that your argument is "we're not adding restrictions, we're just saying what we think Copyright law / the GPL should actually be like." But unfortunately you can't "clarify" Copyright Law or "clarify" the GPL by adding terms. Ultimately courts decide that.
(Of course, if somehow your "clarification" happens to align with a court decision, then maybe it will work after all. But in theory your "clarification" is still not necessary and has no additional effect....)
> But in theory your "clarification" is still not necessary and has no additional effect....
Except your clarification will be interpreted by a court of law. “This license is compatible with the GPL and I can interpret the GPL in a way that lets me do something this license says I can't” is much less likely to stand than “well maybe the author thought the GPL said this, but it actually says my interpretation”.
This, of course, presumes that such a license is actually compatible with the GPL, something I'm getting less and less certain of over time. (What constitutes a compiled form? If a predictive model doesn't count – which it might not, since it outputs source code, very much unlike how compiled programs normally work – then my argument falls down. And many other things would also knock the argument down; I'm not confident enough that all my assumptions are right, or that they should be right.)
wizzwizz4 is correct. Also, I have explicit clauses saying that GPL/AGPL dominate.
But yes, my licenses may be incompatible (one-way) with permissive licenses. I say "one-way" because code with permissive licenses can still be used in code under my licenses, but maybe not necessarily the other way around.
That does not really ring true to me. AGPL broadens the scope of violations as well, and you cannot use AGPL code in GPL-only code bases without turning the end product AGPL (but you can use GPL-only code in AGPL code bases).
If you're just adding something along the lines of "copying passages extensive enough to reach originality is a violation of this license" then that's indeed already covered by the GPL, and there is really no need to add such a passage other than to be more explicit - and confuse people at least at first about why your license is not actually the GPL. So there isn't much of a point to do it in the first place, in my humble opinion.
If you add text that says something along the lines of "you may not use this code as training data", then you created an incompatible license, and your code cannot be used in GPL code bases, and even worse, since it restricts what you can do with the code more than the GPL, it might even mean you stop being reverse-compatible and may not use GPL'ed code yourself in your own custom-license code base.
The AGPL does not further restrict code uses, just broadens the scope of when you have to make available the code, so it's fine there. However, the original BSD license with the advertising clause is considered incompatible with the GPL.
I am not a lawyer, and these are just my quick layman concerns. I fully recognize you're entitled to use whatever license you find suitable for your code and I am absolutely not entitled to your code and work whatsoever.
But that said, I wouldn't touch your code if I saw a "potentially problematic" custom license, and I wouldn't consider contributing to your projects either.
Honestly, with this whole debacle, I am not going to be accepting outside contributions anyway.
I also understand the concern with a problematic license. However, I don't plan to make a specific exemption about machine learning, but rather tie up an ambiguity.
What I think I'll do is that the license will require that when the licensed source code is used, partially or fully, as an input to an algorithm, the license terms must be distributed with the output of that algorithm.
I don't think this is a violation of the GPL at all because the GPL requires you to distribute the license with the binary code of GPL'ed code, and such binary code is the output of an algorithm (the compiler) whose input was the source code.
But what it would do is put the onus on GitHub that, if they used my code in training that data, if they distributed the results (as they are doing), they must distribute my license terms as well and tell users that some of the results are under those terms.
> binary code is the output of an algorithm (the compiler) whose input was the source code.
Just because binary code is produced by the operation of an algorithm on source code doesn’t make all output produced any algorithm on that source code binary code. Otherwise checksums and hashes and prime numbers would be copyrighted.
You have a point, which is why the legal system would still require that a copy be substantial before they count it as infringing. I would argue that Copilot has already been shown to copy substantial portions, though.
> something along the lines of "you may not use this code as training data"
Would such a term be legally binding under present copyright law? Other than disallowing inclusion in a redistributed dataset specifically intended for training ML models, it's not clear to me that it would actually prevent such use if you already had a copy on hand for some other purpose. (Specifically, note that GitHub indeed already has a copy on hand for their authorized primary purpose of publicly distributing it.)
More generally, the manner in which copyright law applies to machine learning algorithms in general hasn't been worked out by either the courts or legislature yet. Hence the current article ...
Law isn't code.