|
|
|
|
|
by alkonaut
1775 days ago
|
|
> I think MS knows damn well that they've forfeited the ethics of their code generation. There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree. They know their model arbitrary copy/pastes other people's code Microsoft can of course create Copilot using the GitHub code. It’s not publishing any derived work on its own - and this type of access to the code is likely a large part of the reason for buying GitHub in the first place. The only ethical issue for Microsoft here is if Microsoft sells this service (they don’t - yet) and risk including nontrivial code without attribution (seems likely, given the behavior of the preview but if ms for example limits output to a few lines or prevents generating too large chunks verbatim the issue almost disappears). Ethical/legal issues and risks for users of Copilot are much larger, such as if they use it to conjure up a nontrivial snippet and then not research the origin of it. It’s no better than copying it from the original location. Microsoft could probably throw in parts of their closed source in copilot - but not even Microsoft controls that. Third parties have copyrights that prevent it too. But people who keep code in public GitHub repos (I assume) let GitHub do things like train neural nets on it, and Microsoft obviously don’t keep much of the windows or office sources in public GitHub repos. |
|
The fast inverse square root is the most nontrivial code I can think of and it's already been found to appear in suggested snippets, with attribution nowhere to be found.
If we accept Copilot as merely a tool, we'd need to consider any developer using that tool to be immoral. There's no discernable difference between shamelessly copy/pasted code and Copilot output, so why consider the tool more than an automated clipboard?
No, I think the tool is built wrong, setting users up to fail. It's a copyright footgun to produce buggy, vulnerable, often even completely wrong code.
As for the copyrights, all code with a license has the same copyright as any private code hosted on their own servers. You can't just plug some GPL code into your project and sell it, even if you can find the code itself on Google. There is no copyright difference between the projects, it's merely a matter of availability to the scanner.
Adding Microsoft's own, proprietary, quality code to the network would be the gesture of good faith that would make me believe that the developers never intended to break any licenses and that it all just got out of hand.