Hacker News new | ask | show | jobs
by prettygood 1810 days ago
Not copyable by people, but we can go through the code, learn from it and then use that knowledge to improve our coding skills.

Isn't that what autopilot is doing here? The system is merely learning how to code, and then applying it's learnings on other programming problems. It's not like it's writing software to specifically compete with other programs.

2 comments

Not when it outputs large sections of unique code verbatim, as it's been shown to do.
If it's large sections, that can be fixed by either licence attribution or result filtering.

That's at best a technical issue. What way too many people claim, however, is that the machine isn't even allowed to look at GPL'ed code for some reason, while humans are.

I'd like to learn the reasoning behind that.

> What way too many people claim, however, is that the machine isn't even allowed to look at GPL'ed code for some reason, while humans are.

Why would those be the same thing? It's a matter of scale. Just like how people are allowed to read websites, but scraping is often disallowed.

> Just like how people are allowed to read websites, but scraping is often disallowed.

Hosting code on Github explicitly allows this type of usage (scraping) according to their TOS so I have to ask again - why the sudden complains?

Are we still talking about a shortcoming of the ML model, which very occasionally spits out a few lines of copied code or should we include search engines into this, because they do the exact same thing by design?

robots.txt, for example, has a non-binding, purely advisory character as well and Common Crawl [0] (also used for training GPT-3) publishes a dataset that by definition contains GPL'ed code as well, no matter where it's hosted. So is that off-limits now, too?

[0] http://commoncrawl.org

I think result-filtering (based on license of search results) is gnarly enough, and likely computationally intensive, so as to break the whole feature. But it would be interesting to see if that can be crafted to fix the shortcomings of the ML model.
There's a really philosophical question here about whether Copilot is learning or imitating.

For instance, a parrot doesn't learn to speak, it learns to imitate speech.