Hacker News new | ask | show | jobs
by ghoward 1807 days ago
wizzwizz4 is correct. Also, I have explicit clauses saying that GPL/AGPL dominate.

But yes, my licenses may be incompatible (one-way) with permissive licenses. I say "one-way" because code with permissive licenses can still be used in code under my licenses, but maybe not necessarily the other way around.

I'm okay with that.

1 comments

That does not really ring true to me. AGPL broadens the scope of violations as well, and you cannot use AGPL code in GPL-only code bases without turning the end product AGPL (but you can use GPL-only code in AGPL code bases).

If you're just adding something along the lines of "copying passages extensive enough to reach originality is a violation of this license" then that's indeed already covered by the GPL, and there is really no need to add such a passage other than to be more explicit - and confuse people at least at first about why your license is not actually the GPL. So there isn't much of a point to do it in the first place, in my humble opinion.

If you add text that says something along the lines of "you may not use this code as training data", then you created an incompatible license, and your code cannot be used in GPL code bases, and even worse, since it restricts what you can do with the code more than the GPL, it might even mean you stop being reverse-compatible and may not use GPL'ed code yourself in your own custom-license code base.

The AGPL does not further restrict code uses, just broadens the scope of when you have to make available the code, so it's fine there. However, the original BSD license with the advertising clause is considered incompatible with the GPL.

I am not a lawyer, and these are just my quick layman concerns. I fully recognize you're entitled to use whatever license you find suitable for your code and I am absolutely not entitled to your code and work whatsoever.

But that said, I wouldn't touch your code if I saw a "potentially problematic" custom license, and I wouldn't consider contributing to your projects either.

I understand your concerns.

Honestly, with this whole debacle, I am not going to be accepting outside contributions anyway.

I also understand the concern with a problematic license. However, I don't plan to make a specific exemption about machine learning, but rather tie up an ambiguity.

What I think I'll do is that the license will require that when the licensed source code is used, partially or fully, as an input to an algorithm, the license terms must be distributed with the output of that algorithm.

I don't think this is a violation of the GPL at all because the GPL requires you to distribute the license with the binary code of GPL'ed code, and such binary code is the output of an algorithm (the compiler) whose input was the source code.

But what it would do is put the onus on GitHub that, if they used my code in training that data, if they distributed the results (as they are doing), they must distribute my license terms as well and tell users that some of the results are under those terms.

> binary code is the output of an algorithm (the compiler) whose input was the source code.

Just because binary code is produced by the operation of an algorithm on source code doesn’t make all output produced any algorithm on that source code binary code. Otherwise checksums and hashes and prime numbers would be copyrighted.

Bats are not birds.

You have a point, which is why the legal system would still require that a copy be substantial before they count it as infringing. I would argue that Copilot has already been shown to copy substantial portions, though.
> something along the lines of "you may not use this code as training data"

Would such a term be legally binding under present copyright law? Other than disallowing inclusion in a redistributed dataset specifically intended for training ML models, it's not clear to me that it would actually prevent such use if you already had a copy on hand for some other purpose. (Specifically, note that GitHub indeed already has a copy on hand for their authorized primary purpose of publicly distributing it.)

More generally, the manner in which copyright law applies to machine learning algorithms in general hasn't been worked out by either the courts or legislature yet. Hence the current article ...