Hacker News new | ask | show | jobs
by Longlius 1464 days ago
Willful copyright infringement for monetary gain can be prosecuted as a criminal act in the United States (and many other countries including Japan) and it's highly possible Github themselves can end up in hot water for facilitating this.
3 comments

> it’s highly possible Github themselves can end up in how water for facilitating this.

It might be possible, I don’t know about “highly”. Have you checked the license exclusions required to use Github? Their terms already carve out a Copyright exception for Github, because they need it on order to host your code. There’s also no reason Github can’t filter certain licenses, or make it impossible to complete entire functions, or build an option for everyone to opt-in to being autocomplete source material regardless of license, right? Any legal challenges are likely to result in changes to the feature before there are ever any serious repercussions.

I think it’s at least as likely, if not more so, that Copyright Law could evolve in response to the growing number of AI auto completers, and we (society) try to allow it within reason by being more specific about what constitutes automated infringement and who’s responsible for it. Fair Use currently exists but is vague and left up to courts to decide. In the meantime, Copyright is primarily intended to foster a balance between business and freedom of expression, and there’s a lot of open source software on Github that cares about freedom of expression and not about business. In any case, we don’t really want Copyright to represent some kind of absolute ownership land-lock over every string of 100 characters, that is a bit antithetical to both Copyright and the FOSS community.

wow the number of legal experts that appear and debate hypotheticals when everything is spelled out quite clearly in the license agreements is very high on this site.

Triply so when Microsoft is involved.

You and I have a different understanding of “willful”. If you’ve used copilot you’ll know that the vast majority of the time it’s not infringing anybody’s copyright, it’s creating code that is highly unique to the problem you are trying to solve.
All output of machine learning algorithms is derived from the training set. There is no creativity, just lots of complexity. What that means legally has yet to be fully determined.
If that were the case, how can models such as DALL-E 2 generate “Homer Simpson in The Godfather” type images. It’s clear that machine learning models are capable of independent creation.

As far as copilot goes, yes it’s possible to get it to recite copyrighted works, but in normal usage it is creating independent works because it is too influenced by the structure of your code around the insertion point to recite anything. It’s auto completing things like the variable names that you already declared, simple loops and function applications, etc.

> What that means legally has yet to be fully determined.

At least in the US, the Supreme Court ruled in Google v Oracle that the entire Java API is not copyrightable. Copilot users are very far from crossing the line, the courts are not going to come after some de minimis 10-line snippet that copilot generated.

Whether Microsoft itself was legally in the right by training copilot is a more interesting legal question that remains unresolved.