Hacker News new | ask | show | jobs
by volta83 1813 days ago
So how do you know if the code that Copilot regurgitates is almost a 1:1 verbatim copy of some GPL'ed code or not ?

Because if you don't realize this, you might be introducing GPL'ed code into your propiertary code base, and that might end up forcing you to distribute all of the other code in that code base as GPL'ed code as well.

Like, I get that Copilot is really cool, and that software engineers like to use the latest and bestest, but even if the code produced by Copilot is "functionally" correct, it might still be a catastrophic error to use it in your code base due to licenses.

This issue looks solvable. Train 2 copilots, one using only BSD-like licensed software, and one using also GPL'ed code, and let users choose, and/or warn when the snippet has been "heavily inspired" by GPL'ed code.

Or maybe just train an adversarial neural network to detect GPL'ed code, and use it to warn on snippets, or...

4 comments

You have the same issue with MIT because it requires attribution
Doesn't this go beyond license and into copyright?

The license lets you modify the program, but the copyright still enforces that you can't copy/past code from it to your own project no?

The solution might be simpler than we think,just tell the algorithm
It's very easy: don't use copilot code verbatim, and you won't have GPL code verbatim.
> It's very easy: don't use copilot

Fixed that for you.

Verbatim isn't the problem / solution. If you take a GPL'ed library and rename all symbols and variables, the output is still a GPL'ed library.

Just seeing the output of GPL'ed code spitted by copilot and writing different code "inspired" by it can result in GPL'ed code. That's why "clean room"s exist.

Copilot is going to make for a very interesting to follow law case, because probably until somebody sues, and courts decide, nobody will have a definitive answer of whether it is safe to use or not.

Stack Overflow content is licensed under CC-BY-SA. Terms [1]:

* Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

* ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

In over a decade of software engineering, I've seen many reuses of Stack Overflow content, occasionally with links to underlying answers. All Stack Overflow content use I've seen would clearly fail the legal terms set out by the license.

I suspect Copilot usage will similarly fail a stringent interpretation of underlying licenses, and will similarly face essentially no enforcement.

[1] https://creativecommons.org/licenses/by-sa/4.0/