Hacker News new | ask | show | jobs
by lolinder 1305 days ago
> Copilot is just copy / paste of the code it was trained on.

Every time I hear someone say this, I hear "I've never really tried Copilot, but I have an opinion because I saw something on Twitter."

Given the function name for a test and 1-2 examples of tests you've written, Copilot will write the complete test for you, including building complex data structures for the expected value. It correctly uses complex internal APIs that aren't even hosted on GitHub, much less publicly.

Given nothing but an `@Test` annotation, it will actually generate complete tests that cover cases you haven't yet covered.

There are all kinds of possible attacks on Copilot. If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

EDIT: There's also this fun Copilot use I stumbled across, which I dare you to find in the training data:

    /**
    Given this text:
 
    Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

    Fill in a JSON structure with my name, how much money I had, and where I'm going:
    */

    {
        "name": "Ishmael",
        "money": 0,
        "destination": "the watery part of the world"
    }
2 comments

It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.
>> It can even read an invoice, you can ask it "what is the due date?" It's a system that solves due date and Ishmael questions out of the box. And everything in-between.

That's cool.

But emitting copyrighted code without attribution and in violation of the code's license is still copyright infringement.

If I created a robot assistant that cleans your house, does the shopping, and occasionally stole things from the store, it would still be breaking the law.

> occasionally stole things from the store

It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft - copying open online content and sharing? theft, learning from data and generating - also theft. Stealing from a physical store - you guessed it.

>> It's fascinating to see how stretchy the word "steals" is nowadays. You can make anything be theft

Theft has a definite legal meaning. So does copyright infringement.

The court can decide if it is copyright infringement or fair use:

https://githubcopilotlitigation.com/pdf/1-0-github_complaint...

While I do enjoy everybody acting as armchair lawyers.... until we get an actual legal ruling, the general consensus seems to be that it is sufficiently transformative as to be considered fair use.
>> If you had said it can copy/paste its training data I wouldn't have argued, but "it just copy/pastes the code it was trained on" is demonstrably false, and anyone who's really tried it will tell you the same thing.

So if "it could commit copyright infringement, but does not always do so" is good enough for your company's legal review team, then go for it.

Has anyone tried to see how similar is their manually written code to other codes out there? I bet small snippets 1-2 lines long are easy to find. It would be funny to realise that we're more "regurgitative" than Copilot by mere happenstance.
Will the court believe that Copilot created an exact copy of Tim Davis's code "by mere happenstance"?

https://twitter.com/DocSparse/status/1581461734665367554