Hacker News new | ask | show | jobs
by alkonaut 1775 days ago
> I think MS knows damn well that they've forfeited the ethics of their code generation. There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree. They know their model arbitrary copy/pastes other people's code

Microsoft can of course create Copilot using the GitHub code. It’s not publishing any derived work on its own - and this type of access to the code is likely a large part of the reason for buying GitHub in the first place.

The only ethical issue for Microsoft here is if Microsoft sells this service (they don’t - yet) and risk including nontrivial code without attribution (seems likely, given the behavior of the preview but if ms for example limits output to a few lines or prevents generating too large chunks verbatim the issue almost disappears).

Ethical/legal issues and risks for users of Copilot are much larger, such as if they use it to conjure up a nontrivial snippet and then not research the origin of it. It’s no better than copying it from the original location.

Microsoft could probably throw in parts of their closed source in copilot - but not even Microsoft controls that. Third parties have copyrights that prevent it too.

But people who keep code in public GitHub repos (I assume) let GitHub do things like train neural nets on it, and Microsoft obviously don’t keep much of the windows or office sources in public GitHub repos.

1 comments

I don't think selling the service or giving it away for free makes any difference. They're not creating services out of the goodness of their hearts, and these projects rack up a lot of server costs. Even if their service is free, they're getting a return on their investments somehow.

The fast inverse square root is the most nontrivial code I can think of and it's already been found to appear in suggested snippets, with attribution nowhere to be found.

If we accept Copilot as merely a tool, we'd need to consider any developer using that tool to be immoral. There's no discernable difference between shamelessly copy/pasted code and Copilot output, so why consider the tool more than an automated clipboard?

No, I think the tool is built wrong, setting users up to fail. It's a copyright footgun to produce buggy, vulnerable, often even completely wrong code.

As for the copyrights, all code with a license has the same copyright as any private code hosted on their own servers. You can't just plug some GPL code into your project and sell it, even if you can find the code itself on Google. There is no copyright difference between the projects, it's merely a matter of availability to the scanner.

Adding Microsoft's own, proprietary, quality code to the network would be the gesture of good faith that would make me believe that the developers never intended to break any licenses and that it all just got out of hand.

I can’t see how developer ethics comes into it at all here. Either the code is trivial boilerplate and not a license issue, then there is zero ethical issues with using it in my opinion. Just like I copy 2 lines of code to open a file from any repository with any license without either ethical or IP worries. If the code is nontrivial like the fast inverse sqrt - then it’s on the user to realize that they have been fed a landmine by copilot, and it’s on them to avoid or attribute as appropriate. This is a license issue though, not an ethical one. I fail to see a situation where it’s ethical to violate a license or unethical to use code that doesn’t violate a license.

Note though that all such examples of nontrivial regurgitation that have been presented yet have been deliberately “triggered” (as far as I know) knowing they would likely show up if copilot was fed the function header. It’s also important to remember that this is still preview software. The final version hopefully has more restricted output since this is obviously the big weakness of the system.

I agree it’s a license footgun 100%. But as I said this is the developers problem. Which is why few of us will ever be able to use it in its current form.

As for the ms sources argument - the reason ms bought GitHub is to have this kind of access to a lot of code. It’s their code to use in this way. People who committed code gave GitHub (and it’s future owners) the right. Microsoft (as far as I understand) can sell the right to view this code, for example, through GitHub fees. It’s not against the license of a GPL repo to do so. So Microsoft isn’t violating a license by mangling the code into snippets and charging for the pleasure of downloading those snippets. What’s against the license terms is for me to download the snippet, and accidentally use it in my proprietary software. Does that make the tool bad to the point of being useless? Perhaps. Is it illegal or unethical? I don’t think so.

> You can't just plug some GPL code into your project and sell it, even if you can find the code itself on Google.

Although some people seem to think copilot can be used to “wash” licenses by giving users a black box “excuse”, I think that idea is dead in the water. Anyone who has a nontrivial-enough GPL snippet in their proprietary code has violated the license.