Hacker News new | ask | show | jobs
by ghoward 1793 days ago
You're right that code has to demonstrate creativity for copyright. But that also means that an algorithm, even a transformative algorithm, cannot change copyright because an algorithm is not creative, by definition.

This means that the output of any algorithm on copyrighted code is still under the original copyright. I mean, we still apply the copyright of the original to the output of compilers, even though compilers can be transformative with inlining and link-time optimization, to the point that it mixes disparate code in the same way Copilot does.

In fact, I wrote some software licenses [1] that codify the fact that algorithms cannot change copyright.

[1]: https://yzena.com/licenses/

1 comments

You sound very confident about this, whereas copyright lawyers I've read discuss this issue seem much less confident overall, but lean toward thinking this would be fair use.

What makes you so confident that this would not be ruled fair use?

(And for people not familiar - if ruled fair use, it doesn't matter what the license is because fair use is an exception to copyright itself.)

I have a feeling you did not read the FAQ of the licenses. I don't blame you, but they explain my position.

Here's the relevant quote:

> GitHub is arguing that using FOSS code in Copilot is fair use because using data for training a machine learning algorithm has been labelled as fair use. [1]

> However, even though the training is supposedly fair use, that doesn’t mean that the distribution of the output of such algorithms is fair use.

My licenses say, basically, "Sure, training is fair use, but distributing the output is not."

The licenses specifically say that the copyright applies to any output of any algorithm that uses the source code code as all or part of its input.

Now, I have not gotten a lawyer to look at my licenses yet (it's in the works), so don't use them yourself. But because everyone keeps saying that training is fair use, I'm fairly confident that only training is fair use.

Of course, it might not be, but that would take more court cases and more precedent. I wanted to poison the well now [2] to make companies nervous about using a model that was partially trained with code licensed under my licenses.

[1]: https://valohai.com/blog/copyright-laws-and-machine-learning...

[2]: https://gavinhoward.com/2021/07/poisoning-github-copilot-and...

> My licenses say, basically, "Sure, training is fair use, but distributing the output is not."

Licenses basically by definition cannot say what is and isn't fair use...

> Licenses basically by definition cannot say what is and isn't fair use...

Yes. However, my licenses only say what people already say. Then the licenses go further and say, "But anything else is not allowed."

Everyone else says training is fair use. My licenses agree. But they make it clear that I don't believe that anything else is fair use.

Yes, these licenses must be tested in court. Except that they poison the well now.

It's mildly interesting that you've decided to express your personal opinion about what is or is not fair use within in your license text, but the fact is that if a use of the work is deemed to be fair use under the law then the terms of the license you're offering are completely irrelevant. Your permission is not required to make fair use of the work, so no one needs to agree to your license.
> It's mildly interesting that you've decided to express your personal opinion about what is or is not fair use within in your license text, but the fact is that if a use of the work is deemed to be fair use under the law then the terms of the license you're offering are completely irrelevant. Your permission is not required to make fair use of the work, so no one needs to agree to your license.

You do not seem to get it. Yes, I understand that if fair use applies, my licenses don't matter. I get that. I promise I do get that.

The purpose of these licenses is to sow doubt that fair use applies to distributing the output of ML models.

Lawyers are usually a cautious lot. If a legal question has not been answered, they usually want to stay away from any possibility of legal risk regarding that question.

The licenses create a question: does fair use apply to the output of ML algorithms? With that question not answered, lawyers and their companies might elect to stay away from ML models trained with my code, and ML companies might stay away from training ML models on my code in the first place.

That is what I mean by "poisoning the well." The poison is doubt about the legality of distributing the output of ML models, and it is meant to put a damper on enthusiasm for code being used to train ML models, especially for my code.

Licenses can't dictate what is not allowed unless the user wants to use it in a way compliant with the rest of the license. If you decide to not follow the license at all, then it's effectively like any other copyright where you can use it without the owner's permission under fair use.

That doesn't usually mean you can use code though, see: https://news.ycombinator.com/item?id=27726343