Hacker News new | ask | show | jobs
by b3morales 1811 days ago
Your summary is generally correct, and I certainly agree with the other commenter's position on their work. But I think you're still missing the point. Copyright is the mechanism that allows you to prevent copying, but GitHub's claim is that copyright is irrelevant to Copilot's input.

I have a nice strong lock on my door. GitHub (asserts that it) can enter my home through the window.

Adding another deadbolt to the door does not help.

2 comments

I don't think I missed that point. I'm trying to argue that copyright is relevant to Copilot's input if not allowed by an OSS license.

Maybe I'm missing something (just not the thing you said), but has Github made any legal claims so far? The original article is written by a politician in EU...

Even if you're a lawyer defending Github in this case, there's still a couple things that needs to be clarified before you can make the case: (maybe the info is out there but I'm too lazy to research)

- Is Github only using code/repos that are explicitly under OSS licenses? (because if that's the case, then the discussion might be justified in presuming OSS terms, and it may be the case that more restrictive non-OSS licenses would require a different analysis)

- As somebody pointed out in another thread, the Github terms of service agreement seems to grant Github additional rights when dealing with user uploaded content. Is that a legal basis for the use?

> I'm trying to argue that copyright is relevant to Copilot's input if not allowed by an OSS license.

And I tend to agree with you (and the other commenter) here. But GitHub doesn't.

> has Github made any legal claims so far?

I'm not sure how actively, but the CEO was here in the announcement thread the other day saying that they think the ingestion of the inputs is a "fair use". They also have some material defending the output side: https://docs.github.com/en/github/copilot/research-recitatio...

> Is Github only using code/repos that are explicitly under OSS licenses?

I don't think we know exactly what code they used as inputs, no.

Their argument defending the output side doesn't hold water, IMO. If Copilot produces exact copies verbatim, even some of the time, then as long as customers don't have access to the code used to generate the model, how can they be sure?

It's a matter of scale. With a big enough codebase, there will be copyright violations.

> I don't think I missed that point. I'm trying to argue that copyright is relevant to Copilot's input if not allowed by an OSS license.

The point (that they claim that) you are missing is that if "copyright is relevant to Copilot's input" then almost all existing OSS licenses already don't allow that.

The licenses that I am making implicitly acknowledge the argument that training an ML model is fair use.

However, GitHub said nothing about the output of the model being fair use. My license will say that the output of their model is under the same license as the input, which means they have restrictions if they want to distribute it (i.e., actually have people use Copilot).

I think this will work because it doesn't say that GitHub is wrong. Instead, it says that, even if GitHub is right, it doesn't matter.

It would also be very bad for GitHub to claim that the output of an algorithm can't be under the same license as the input because we feed licensed code to algorithms all the time and claim that their output is still under the same license. We call those algorithms "compilers" and the binary code they produce is still copyrighted and licensed.