Hacker News new | ask | show | jobs
by hekec 1810 days ago
On their website they say that "GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set."

So it won't copypaste your code. It had just read code from open sources and learned from it - similar to what humans do. So I don't see any problem with this.

4 comments

First, it does copypaste code: https://docs.github.com/en/github/copilot/research-recitatio...

Second, we can't ignore that if someone deliberately tries to make it spit out copyrighted code, the chances are going to be much greater.

Why would anyone? Plausible deniability: "I didn't copy this GPL procedure, the copilot gave it to me!"

GitHub has 56 million users as of September 2020 (according to Wikipedia). Let's assume that only 1 million of them use Copilot at an average of once a week.

That means that every week, there will be 1000 verbatim copypaste of code by Copilot. Then multiply that by a year or more as Copilot gets older.

0.1% may not seem like a lot, but at the scale of Internet companies, it always is.

> So it won't copypaste your code.

You might want to check out this video...

https://twitter.com/mitsuhiko/status/1410886329924194309

> the vast majority of the code that it suggests is uniquely generated and has never been seen before.

Original code in somebody's GitHub repo:

  int x = y + z;
Copilot code:

  int Eisaa7ha = Wu8iazo7 + Roh0Eesh;
Not copy pasted! Uniquely generated! Never before seen!