| I understand your concerns. Honestly, with this whole debacle, I am not going to be accepting outside contributions anyway. I also understand the concern with a problematic license. However, I don't plan to make a specific exemption about machine learning, but rather tie up an ambiguity. What I think I'll do is that the license will require that when the licensed source code is used, partially or fully, as an input to an algorithm, the license terms must be distributed with the output of that algorithm. I don't think this is a violation of the GPL at all because the GPL requires you to distribute the license with the binary code of GPL'ed code, and such binary code is the output of an algorithm (the compiler) whose input was the source code. But what it would do is put the onus on GitHub that, if they used my code in training that data, if they distributed the results (as they are doing), they must distribute my license terms as well and tell users that some of the results are under those terms. |
Just because binary code is produced by the operation of an algorithm on source code doesn’t make all output produced any algorithm on that source code binary code. Otherwise checksums and hashes and prime numbers would be copyrighted.
Bats are not birds.