Hacker News new | ask | show | jobs
by gradys 1791 days ago
You sound very confident about this, whereas copyright lawyers I've read discuss this issue seem much less confident overall, but lean toward thinking this would be fair use.

What makes you so confident that this would not be ruled fair use?

(And for people not familiar - if ruled fair use, it doesn't matter what the license is because fair use is an exception to copyright itself.)

1 comments

I have a feeling you did not read the FAQ of the licenses. I don't blame you, but they explain my position.

Here's the relevant quote:

> GitHub is arguing that using FOSS code in Copilot is fair use because using data for training a machine learning algorithm has been labelled as fair use. [1]

> However, even though the training is supposedly fair use, that doesn’t mean that the distribution of the output of such algorithms is fair use.

My licenses say, basically, "Sure, training is fair use, but distributing the output is not."

The licenses specifically say that the copyright applies to any output of any algorithm that uses the source code code as all or part of its input.

Now, I have not gotten a lawyer to look at my licenses yet (it's in the works), so don't use them yourself. But because everyone keeps saying that training is fair use, I'm fairly confident that only training is fair use.

Of course, it might not be, but that would take more court cases and more precedent. I wanted to poison the well now [2] to make companies nervous about using a model that was partially trained with code licensed under my licenses.

[1]: https://valohai.com/blog/copyright-laws-and-machine-learning...

[2]: https://gavinhoward.com/2021/07/poisoning-github-copilot-and...

> My licenses say, basically, "Sure, training is fair use, but distributing the output is not."

Licenses basically by definition cannot say what is and isn't fair use...

> Licenses basically by definition cannot say what is and isn't fair use...

Yes. However, my licenses only say what people already say. Then the licenses go further and say, "But anything else is not allowed."

Everyone else says training is fair use. My licenses agree. But they make it clear that I don't believe that anything else is fair use.

Yes, these licenses must be tested in court. Except that they poison the well now.

It's mildly interesting that you've decided to express your personal opinion about what is or is not fair use within in your license text, but the fact is that if a use of the work is deemed to be fair use under the law then the terms of the license you're offering are completely irrelevant. Your permission is not required to make fair use of the work, so no one needs to agree to your license.
> It's mildly interesting that you've decided to express your personal opinion about what is or is not fair use within in your license text, but the fact is that if a use of the work is deemed to be fair use under the law then the terms of the license you're offering are completely irrelevant. Your permission is not required to make fair use of the work, so no one needs to agree to your license.

You do not seem to get it. Yes, I understand that if fair use applies, my licenses don't matter. I get that. I promise I do get that.

The purpose of these licenses is to sow doubt that fair use applies to distributing the output of ML models.

Lawyers are usually a cautious lot. If a legal question has not been answered, they usually want to stay away from any possibility of legal risk regarding that question.

The licenses create a question: does fair use apply to the output of ML algorithms? With that question not answered, lawyers and their companies might elect to stay away from ML models trained with my code, and ML companies might stay away from training ML models on my code in the first place.

That is what I mean by "poisoning the well." The poison is doubt about the legality of distributing the output of ML models, and it is meant to put a damper on enthusiasm for code being used to train ML models, especially for my code.

It still amounts to an opinion statement in the license text which has no real bearing on the license. I was trying to be charitable, but your clarification makes it seem even more like you're just trying to spread unsubstantiated FUD in hopes of scaring people away from using your code as input to ML models even when that would be fair use. Which seems to me to be vaguely akin to fraud. Moreover, the license seems like a poor choice of venue to express your opinion since those you're most interested in dissuading (e.g. people using lots of different projects as input to their ML models, without investigating the details of each one) are also the least likely to bother reading it. In terms of raising awareness of how copyright might apply to the output of ML models you'd do better to post your opinions on a blog somewhere and leave the license text for things that can actually be affected by a license.
Licenses can't dictate what is not allowed unless the user wants to use it in a way compliant with the rest of the license. If you decide to not follow the license at all, then it's effectively like any other copyright where you can use it without the owner's permission under fair use.

That doesn't usually mean you can use code though, see: https://news.ycombinator.com/item?id=27726343