Hacker News new | ask | show | jobs
by andix 1748 days ago
It is called LICENSE.txt. License your code as GPL and then Copilot can't reproduce bigger parts of your code.

But as long as you give the public access to your code, they can study it and learn from it. Humans and machines.

4 comments

No, the license that you apply is completely irrelevant, and there’s certainly nothing whatsoever special about the GPL. Copilot is completely depending on being effectively exempt from copyright; if that legal theory falls apart, the entire space (and a lot of other machine learning stuff) is utterly doomed. Trouble is, Copilot can’t tell whether it’s reproducing copyrightable chunks of your code, or indeed where what it produces came from, by the very nature of machine learning techniques.
They could easily tag the source with license info and take that information into account when feeding data in.
That’s not how learning, human or machine, works. Learning is about collecting all kinds of stuff from diverse sources into a great melting-pot, so that you can form something new out of it—but you can’t generally identify where everything comes from. Individual recognisable tricks perhaps, but if you want to say “this code was inspired by X, Y and Z”, well, that inspiration is typically everything, the entire corpus.
It could, actually, if it were augmented with the ability to do so – but that would be a bit more expensive.
I don't think that the GPL gives much more protection than any other FOSS license, here, in practice.

If Copilot were to reproduce a larger part of, say, an MIT-licensed codebase or almost any other permissive licence, then they should legally provide attribution. I'm pretty sure that they don't even have an option to provide such specific attribution, which means that either they believe that the code copied from any one source is below the relevant threshold or they're just ignoring copyright.

I would assume github could supercede your license by putting its own claim to your code in the TOS. I doubt they have done that, but just pointing it out.
I don’t think that’s possible, as long as you don’t actively accept that. Nobody can claim your copyright without your approval.

It would be also the end of GitHub, as most users probably won’t accept such terms.

According to many people familiar with the legal aspect training on code constitutes fair use, so can't be prevented by any kind of license.
Training, exactly. But the trained person or AI is not allowed to reproduce your exact code. But Copilot seems to do that from time to time.
Sure they can, search engines produce copyrighted material all the time. The issue comes in when people think this somehow indemnifies them as users of Copilot - my guess is, it doesn't protect you any more than if you use a search engine to copy an entire codebase for your own purposes.
I dont disagree with users not accepting the terms, just pointing out that license text doesn't trump everything.
I'd love to see ML-GPL which specifically deals with using licensed property as a training set.
Not possible. Such licenses are founded upon copyright doctrine, and copyright doesn’t protect against learning, natural or machine. As it stands (and this can certainly change), legal consensus in general (regardless of jurisdiction) is that if you publish your code where they can reach it, they can use it.
So would it (theoretically) be legal to train on the JS files services like gmail.com serve to the client? What about decompiled output of proprietary software like certain files in Windows and macOS?
Except for any laws or restrictions against decompiling, it would legally be no different than the GPL case. Although personally I think since co-pilot is capable of redistributing the code, the question of whether the GPL permits the specific usage is still unclear.
I would expect so, though given the limitations of decompilation (in the absence of debug info) I don't know how useful it would be
Or specifically no closed-source or closed-data tools. I wouldn't mind if a non-profit org wanted to help, but it's Microsoft—and they want to sell it back to us in the future.