Hacker News new | ask | show | jobs
by jacquesm 1386 days ago
It's not complex at all: if you use Copilot to generate code for you you are engaging in copyright infringement.

That you got the code from and entity that stole it somewhere else doesn't really matter. Generative models should respect copyright for their sources, and using a generative model to create new works that you intend to claim copyright on is stupid: someone may well show up one day with ironclad proof that you used their code without permission.

2 comments

Except every attorney that has taken a look disagrees with you. So it is complex and unless you have any standing or new data I think it's safe to dismiss your entire argument.

https://decoded.legal/blog/2021/06/github-copilot-initial-th...

https://fossa.com/blog/analyzing-legal-implications-github-c...

https://felixreda.eu/2021/07/github-copilot-is-not-infringin...

Lawyers don't judge cases, judges do.

Until then all you have is opinions, mine is pretty straightforward: if the generative model can be made to work without first training it on other people's code then it isn't copyright infringement, if not then it is transforming one set of works into another.

The only thing that might let GitHub off the hook is their terms of service, but that might mean mass exodus from GitHub because if they interpret you using GitHub to host your code as a blanket permission to do with that code whatever they want then that's clearly not the original intent of the service.

If Microsoft buying GitHub claims that gave them a blanket license to do as they please with the contributions of millions of FOSS contributors then they are still just as bad as they were in the past.

Almost every GitHub repository comes with a license file, even GitHub should have to abide by that license, otherwise the whole thing is pointless.

Unless you have specialized training in copyright law, your opinion is unfortunately invalid when compared to actual experts in the field. You're making assertions that you clearly cannot substantiate coupled with the fact that we're not seeing an influx of litigants. Personally, I'm yet to see any news of even a single litigant challenging copilot. Also, the outcome of cases in the US are in many cases decided by a jury, not a judge.
I've fielded a couple of copyright lawsuits and won them, obviously the lawyers of the defendants thought they had an excellent case. I may not be a lawyer but I do know a thing or two about copyright law and as far as I'm concerned if you claim that you have created something because you took someone else's copyrighted content and pushed it through a machine of sorts that does not create an original work. There is plenty of settled caselaw around this. So that much we can establish off the bat. Which means if you use this to create your own copyrighted work you may have a problem anyway. Whether or not it is infringing or not is largely a matter of the length of the segment produced and whether or not it matches the original in some non-trivial way. The mechanism in the middle doesn't really matter.

If Microsoft/GitHub want to field the argument that they own the rights to all of the code uploaded to GitHub then I'm perfectly fine with that, the only problem I see with that defense is that it will likely kill GitHub overnight.

As for the jury argument: that's fine, but juries aren't lawyers either. I'm not sure if that should weigh as a positive or a negative for Microsoft.

Finally, regardless of the legality: there is such a thing as ethics and in my book you don't appropriate a large body of work from a whole community without so much as a by-your-leave. There have been other threads on HN regarding this and it is interesting to see the various opinions, even so if Copilot is challenged legally than I'll be cheering on the party bringing the suit.

A reminder that copyright infringement vs fair use is in part dependent on the amount of the copyrighted material that’s being used, the nature of that use and the transformativeness of the infringing work. Just because co-pilot suggests code snippets that can be found in a copyrighted work does not mean that the resulting produced product is in fact an infringement of that copyright.

Also a reminder that outside the copilot debate, the online rights movement has largely been pushing for scraping, deep linking and transforming scrapped data to not be considered copyright infringement, regardless of any TOS on the site being scraped.

To me, co pilot is a exactly that, a scraper that has scraped public websites and is now presenting me the scraped data in an alternative and often transformed form. It’s my responsibility as a developer to ensure that my released product complies with applicable copyright law, but copilot and the use thereof is not in and of itself copyright infringement.

That a tool can be used to create infringing work or infringe on copyright in general is no more a valid argument against co pilot than it is against CD burners, de-drm tools, vcrs, kodi or plex, scanners or any number of day to day items that have the ability to infringe copyright if the user uses it for that purpose.