|
|
|
|
|
by jeroenhd
1775 days ago
|
|
I think MS knows damn well that they've forfeited the ethics of their code generation. There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree. They know their model arbitrary copy/pastes other people's code so they train it almost exclusively on other people's code that they don't care for it it gets stolen. Their assumption seems to be "if Bing can find it, it's up for grabs, no matter the license". Good luck getting the same treatment from MS if you upload the leaked XP kernel to github to make your own fork. I'll accept the ethics of copilot when they add the source code for Windows, Azure and Office to their training set, because only then will MS truly reflect that their model doesn't cross the spirit or even letter of any licensing. |
|
Microsoft can of course create Copilot using the GitHub code. It’s not publishing any derived work on its own - and this type of access to the code is likely a large part of the reason for buying GitHub in the first place.
The only ethical issue for Microsoft here is if Microsoft sells this service (they don’t - yet) and risk including nontrivial code without attribution (seems likely, given the behavior of the preview but if ms for example limits output to a few lines or prevents generating too large chunks verbatim the issue almost disappears).
Ethical/legal issues and risks for users of Copilot are much larger, such as if they use it to conjure up a nontrivial snippet and then not research the origin of it. It’s no better than copying it from the original location.
Microsoft could probably throw in parts of their closed source in copilot - but not even Microsoft controls that. Third parties have copyrights that prevent it too.
But people who keep code in public GitHub repos (I assume) let GitHub do things like train neural nets on it, and Microsoft obviously don’t keep much of the windows or office sources in public GitHub repos.