Hacker News new | ask | show | jobs
by res0nat0r 1820 days ago
From the faq:

> GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.

I'm guessing this covers it. I'm not sure if someone posting their code online, but explicitly saying you're not allowed to look at it, getting ingested into this system with billions of other inputs could somehow make you liable in court for some kind of infringement.

3 comments

That doesn't cover it, since that is a technical answer for a non-technical question. The same questions remain.
that doesn't include patent violations nor license violations or compatibility between licenses. Which would be the most numerous and non-trivial cases.
How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?

Does everyone in this thread contact their lawyers after cutting and pasting a mergesort example from Stackoverflow that they've modified to fit their needs? Seems folks are reaching a bit.

For that very reason, many companies have policies that forbid copying code from online (especially from StackOverflow).
That mitigates copyright concerns, but patent infringement can occur even if the idea was independently rediscovered.
I was answering a specific question, "How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?" The answer is that many companies have forbidden that specific action in order to remove the risk from that action.

You are expanding the discussion, which is great, but that doesn't apply in answer to that specific question.

There are answers in response to your question, however. For example, many companies use software for scanning and composition analysis that determines the provenance and licensing requirements of software. Then, remediation steps are taken.

Not sure what you're getting at. Are you suggesting that independent discovery is a defense against patents? Or are you clear that it isn't a defense, but just arguing that something from the internet is more likely to be patented than something independently invented in-house? Maybe that's true, but it doesn't really answer the question of

> How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?

The only real answer is a patent search.

I think a patent violation with CoPilot is exactly the same scenario as if you violated a patent yourself without knowing it.
Sounds like using CodePilot can introduct GPLd code into your project and make your project bound by GPL as a result...

0.1% is a lot when you use 100 suggestions a day.