Hacker News new | ask | show | jobs
by blatherard 1340 days ago
The current github terms of service don't seem to mention this use when they describe the license granted github.

https://docs.github.com/en/site-policy/github-terms/github-t...

4. License Grant to Us

We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.

This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program.

3 comments

That mentions everything: Parsing the content, showing it to/sharing it with other users, using it to improve and provide the service. GitHub and all of its features are "the service".
True, but it doesn't mention doing so without the attribution that might be required by the code's licence. If full attribution of where the suggestion was derived from was included¹ there would be not issue IMO², it is this matter that creates the grey area which these discussions result from.

--

[1] the practicality³ of this is a different, though related, discussion

[2] because the user is fully informed and can take responsibility for the decision to use the suggestion or not

[3] or impossibility – given the code could be added by someone who doesn't include that attribution/licence information for the system to be able to pass on even if it were designed to

The terms of service are completely independent of the code's license. The code could say "no one but I may use this", but by using GitHub you give them rights to do everything stated in the Terms of Service.
But if the terms of service says nothing that is in contravention of your licence choice when you agree to them, then the service does something that you consider to be in contravention of your licence choice, what you have is one party unilaterally changing the agreement. Of course the exact legal meaning of the terms and any perceived change in them could and will be debated long, hard, and potentially expensively…

I'll stick to self-hosting instead of using services like GH. Keeps things a little more simple in that regard.

The license agreement is irrelevant. Literally it does not come into play here. Github is not bound by the license; they are bound by the terms of service. The code is co-licensed: once however you declare it, once to Github independently.
Sharing with license intact. If GH is sharing with the license and attribution stripped, then just punting IP vetting to pilot users, it seems to exceed their rights.
Why do people refuse to have even a pre-high-school level of of understanding of licensing? By uploading your code to Github you are granting them their own license to the code under their terms. Your LICENSE file has absolutely nothing to do with it. Your LICENSE file can say "everyone but Github" and wouldn't matter one jot because that's not the license you licensed it to them under.

And if you didn't have the rights to grant the licenses to Github? Then you are in violation of the copyright holder's rights, not Github.

The only remotely plausible, yes-I-have-graduated-fifth-grade argument against Github is that they ought to and certainly do know that huge portions of their users are in fact granting them licenses without the necessary authority. That's an interesting argument we should be having, and instead we're having this inane screaming match by people who have no clue what they're talking about while some of us are sitting here going WTF is wrong with you?

Calm down. No need for ad hominem attacks.

I am discussing the license grant GH includes in its terms. And that doesn't appear to give them a blank check to do anything they want with code those users have uploaded. Certainly not sell it piecemeal.

IANAL, but it's pretty clear that GH explicitly says they will NOT distribute the code. I'm not sure what else you'd call offering to copy a section of code.

"It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service."

Throughout the license, the Content is treated as an indivisible unit, and it specifically refers to the forking functionality. Notice that forking...forks an entire repository, licenses included, etc. You can't fork a single file, and you can't fork a region of a file. GH provides that kind of forking.

Copilot is fine-grained forking.

No significant software company is going to permit copilot to be used and potentially poison their code base in unknown ways, now that this kind of copying is in the open and is clearly a significant danger.

Somebody like Black Duck is going to make a lot of money for trial attorneys by tracing how code was created and finding the "hits". That will be joined with log data indicating who used copilot, when they used it, and exactly what copilot presented as the "hit". This entire process will be performed recursively on the "hit", together with classic source analysis, to find out where something is really from.

The bigger companies are really, really serious about not copying outside code except under really strict conditions -- these conditions mostly look like "no you may not, unless you have one of these specific situations". It's no-by-default, even when it looks like it could be a yes.

"outside of our provision of the Service"

You've ignored the important words. Copilot is part of the Service.

There's nothing about "license intact" in those clauses. GitHub is able to do whatever it wants with the data; any users of the service do have to check on licenses, as they should with any source (including copying from Stack Overflow)
Okay, so Copilot isn't illegal, it's just an engine for doing illegal things? That's... not better?
You're right, we should ban the camera and paintbrush as well because people can make illegal materials out of them too.
Cameras and paint brushes can easily make non-infringing works. Users of them can easily be trained how to avoid taking others work.

Copilot on the other hand basically defaults to infringing behavior. Users would have to go to great lengths to be sure they aren't infringing on others work.

> show it to you and other users...analyze it on our servers...share it with other users...perform it

I don't know, sounds pretty similar to training on ML programs, even if they don't explicitly say "machine learning" in the ToS.

> This license does not grant GitHub the right to sell Your Content.

This would, at a minimum, preclude charging for Copilot.

This is missing the point though. Microsoft claims their use of source code for Copilot is fair use. If they are correct about that, licenses don't matter, this EULA doesn't matter, etc. Everyone should be focusing on this claim, arguing about any other detail before that is decided is a waste of time.

If anyone asked me to define Copilot I'd refer back to this:

> parse it into a search index or otherwise analyze it on our servers; share it with other users

That is the most succinct and most accurate definition of Copilot I've ever seen.