Hacker News new | ask | show | jobs
by tsimionescu 1309 days ago
> That being said, I still find it extremely unlikely that there would be legal ramifications from using a product being pushed by one of the largest software companies in the world.

Microsoft is explicitly saying it's your responsibility to check if the Copilot's output that you add to your codebase is not infringing on anyone's license.

Also, it's actually a complex legal question if Copilot itself is infringing anyone's copyright. But, there is no doubt whatsoever that you don't have the right to distribute someone else's copyrighted code (without a license) just because it was produced by Copilot and not manually copied by you. And it is also very clear that Copilot can occasionally generate larger pieces of someone else's code.

Edit: fixed typos

1 comments

> Microsoft is explicitly saying it's your responsibility to check if the Copilot's output that you ads to your codebase is infringing on anyone's license.

(Never used copilot)

Wow, this is kinda shocking IMO. It kind of negates the entire value proposition of the tool.

How am I supposed to find out whether a snippet is infringing? Should I paste it into google or something? Shouldn’t Copilot be the one to tell me if a snippet too-closely matches some existing code it learned from?

If MS is indeed saying this, I feel like it’s something they put in the agreement to cover their own asses. There’s no way they’d really expect everyone to do this sort of thing. Moreover I don’t feel that’s a very strong defense MS could use in court if somebody decides to go after MS for making the tool that makes infringement so easy. It sounds like one of those “wink wink” types of clauses that they know full well nobody will follow.

From the official FAQ [0]:

> Other than the filter, what other measures can I take to assess code suggested by GitHub Copilot?

> You should take the same precautions as you would with any code you write that uses material you did not independently originate. These include rigorous testing, IP scanning [emphasis mine], and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it.

I think lots of companies do run tools such as BlackDuck and others to scan their entire code base and ensure (or at least have some ass-covering) that there is no accidental copyright infringement.

[0] https://github.com/features/copilot#other-than-the-filter-wh...

How much of what you save by using Copilot will then be spent on BlackDuck licenses?
While the cost to programmers' sanity of running things like BD is immeasurable in my estimation, if you are already doing it, doing it for Copilot code shouldn't add any extra cost, unless Copilot is actually constantly spewing copyrighted code.
> While the cost to programmers' sanity of running things like BD is immeasurable in my estimation

Can you clarify? In my experience, source scan is just another job in one's build pipeline. And I've only seen it fail when it does, in fact, detect a new component (or a license change in the existing component) - because at that point you have to do the legal dance for third-party notices etc. But the latter part something you have to do either way, tools or no tools.

Source scan is indeed not a problem. Scanning all the binary blobs is where things go wrong, on two aspects.

For 1, there are quite a few false positives, especially if you use commercial 3rd parties as well. For example, I had a UI component recognized as some obscure academic micro kernel!? Investigating, we found that happened because that micro kernel project was using the same commercial UI component somewhere (probably under some academic license), and there repo was just where BD had seen this JS code before.

For a second, and much more common and annoying one, at least in BD in my company, you have to add explanations to each individual identified 3rd party package that uses something like GPL to affirm that it is being used in a way that complies with a license. If you're doing something like distributing a Linux VM, that means hundreds of packages that are part of the distribution. This work has to be done manually, which means entering the same copy/paste text in hundreds of places in the atrociosly slow BD UI.

Capex vs opex, huge difference