| That does not really ring true to me. AGPL broadens the scope of violations as well, and you cannot use AGPL code in GPL-only code bases without turning the end product AGPL (but you can use GPL-only code in AGPL code bases). If you're just adding something along the lines of "copying passages extensive enough to reach originality is a violation of this license" then that's indeed already covered by the GPL, and there is really no need to add such a passage other than to be more explicit - and confuse people at least at first about why your license is not actually the GPL. So there isn't much of a point to do it in the first place, in my humble opinion. If you add text that says something along the lines of "you may not use this code as training data", then you created an incompatible license, and your code cannot be used in GPL code bases, and even worse, since it restricts what you can do with the code more than the GPL, it might even mean you stop being reverse-compatible and may not use GPL'ed code yourself in your own custom-license code base. The AGPL does not further restrict code uses, just broadens the scope of when you have to make available the code, so it's fine there. However, the original BSD license with the advertising clause is considered incompatible with the GPL. I am not a lawyer, and these are just my quick layman concerns. I fully recognize you're entitled to use whatever license you find suitable for your code and I am absolutely not entitled to your code and work whatsoever. But that said, I wouldn't touch your code if I saw a "potentially problematic" custom license, and I wouldn't consider contributing to your projects either. |
Honestly, with this whole debacle, I am not going to be accepting outside contributions anyway.
I also understand the concern with a problematic license. However, I don't plan to make a specific exemption about machine learning, but rather tie up an ambiguity.
What I think I'll do is that the license will require that when the licensed source code is used, partially or fully, as an input to an algorithm, the license terms must be distributed with the output of that algorithm.
I don't think this is a violation of the GPL at all because the GPL requires you to distribute the license with the binary code of GPL'ed code, and such binary code is the output of an algorithm (the compiler) whose input was the source code.
But what it would do is put the onus on GitHub that, if they used my code in training that data, if they distributed the results (as they are doing), they must distribute my license terms as well and tell users that some of the results are under those terms.