Hacker News new | ask | show | jobs
by spupe 1455 days ago
I don't see why that should be the case in this particular scenario, or what benefit is gained from that. Could you elaborate?
1 comments

Could you elaborate on why you think a computer program and a person should be treated the same way in this respect?

We can take as self-evident that a human is capable of reading about something, conceptualising it, and then writing something completely new with the knowledge they have gained.

I think it's also pretty uncontroversial that the primitive "AI" we currently have is nowhere near the level of even an average human at these things, and thus we can't just blindly assume it is conceptualising rather than copying. Copilot regularly produces verbatim copies of existing code when working on non-trivial things.

Forget about the "AI" label: Copilot is just a complex computer program, that takes code from other people and inserts various permutations of it into your editor, whilst ignoring the license of that code.

I think it's best if we sidestep these big conceptual questions about what cognition or creativity really are. It's hard to find agreement, and perhaps it is not necessary to do so.

My position is that if a person hired in a company can currently use Google, Stack Overflow and GitHub to help develop their custom scripts, and no moral or copyright issues are infringed (ie, you don't try to say you came up with it on your own, and you use only enough that it is clearly fair use), then I think an AI should be able to assist in that task. There is no need to complicate things by legislating what the AI is doing and what Google is doing, as they are very similar things and in fact even use similar methods.

I would agree with you if the AI was genuinely assisting with that task, but it isn't.

It's taking inputs, ignoring their licenses, permuting them in ways that are not understandable to the user, and then outputting them.

That's an entirely different task than the user reading SO or using Google and then writing their own code, because the "AI" is not capable of writing its own code at that level.

Relying on this tool means ignoring the license of code that you're copying, without even knowing that you're doing it.

> That's an entirely different task than the user reading SO or using Google and then writing their own code, because the "AI" is not capable of writing its own code at that level.

I would say it's a very similar task. If I need to remember how to use a certain function, I can Google for documentation and examples, or I can tell Copilot what I want to do. The fact that the solution was presented by Copilot or a SO thread is, in my view, irrelevant. And to compound on that, I doubt anyone checking SO truly knows where that answer came from. The person could simply be reproducing a snippet from somebody else, you have no way of knowing if it was licensed.

I don't think this is bad either. Even our current shitty copyright laws protect that kind of use. I shouldn't have to worry whether my little prime number generator uses an algorithm first created by John Carmack or Microsoft. Programming has evolved rapidly in great part because we can all use other people's work and use it to improve ours. Of course you shouldn't just copy and paste everything and call it a day, but that's hardly what Copilot enables anyway.

You really seem to be ignoring the core issue by focusing on SO though. Everything on SO is fair game, but code on GitHub is under a variety of licenses, and when Copilot regurgitates it, no matter how complex and inscrutable the process is that leads it to do so, it may be causing the user of Copilot to misuse that code because it doesn't even give them the opportunity to know where it came from or what license it was released to the public under.
Again, how does that differ from Stack Overflow? Do you go and check whether a given reply belongs to a licensed project?

Also, please consider that there is a toggle that allows you to block Copilot from using public code.

If I make a script and train it on Windows source code do you think MS will like it if I use that script on Wine ? I am sure MS will say the license did not allows it and your script transformations are not original, so GPL or similar license should be respected by Microsoft too.

>My position is that if a person hired in a company can currently use Google, Stack Overflow and GitHub to help develop their custom scripts, and no moral or copyright issues are infringed (ie, you don't try to say you came up with it on your own, and you use only enough that it is clearly fair use),

Only a judge will determine if it is actually free use, if you by change copied some super clever and unique code into your code base then I am sure a judge will not say it is fair use, copilot was proven it will do this(though MS said they put some IF-ELSE checks in the AI to prevent the plagiarism to be detected by removing obvious results and maybe obfuscating stuff more).

Maybe Stack Overflow license allows you to copy paste the answers in your code, but GitHub code has repo specific license that you need to respect.

If MS trained the model on all their private repos too and made the model free software then many would not have this issues. Or keep the model proprietary and train it only on the MS repors, BSD and similar licensed repos.

You are saying that the AI should be treated the same way as a person would regarding its 'output'. I disagree. This is a conceptual disagreement and you cannot just sweep under the rug "what cognition or creativity really are".

At the end, when in several (2-5) years we start seeing structural unemployment emerging because of AI deployments, this will be resolved by the legal system, most likely by some sort of partial prohibition of training/monetizing such systems.

I think I still have not understood your argument. Are you saying that you are afraid that AIs will become too powerful and cause unemployment, and therefore we should regulate them now before they do so?

Many people are worried about this, which is why there is a lot of debate about minimum income programs. However, at present, what Copilot is doing is similar to what Google does, and it is certainly not going to replace devs any time soon. Personally, I think we should exploit technology to its fullest, and the only reason we can have this conversation is because in the past, we haven't given too much consideration about the mailmen, secretaries, delivery workers and everyone else who got displaced by our use of the internet and similar technologies. We merely adapted to better exploit them.

I am not saying (in that last comment) what should happen, I am saying what will happen. Past automation in terms of impact is nothing compared to what's coming and people and lawmakers will react accordingly - not in favor of the automators.
Copilot understands concepts as well as may humans. You can see primitive versions of this in the old Word2Vec demos showing how those models understand how London:England ~= Paris:France

Copilot is much more sophisticated than that, and it no more copies code than a human does. It generates on a character by character basis given the contextual probability of the next character conditioned on the previous set of tokens with the "heat" being a factor how how randomly it will choose characters.

This is much more similar to how a human writes than "copying".

"it no more copies code than a human does" < that's a very big call right there, considering how much verbatim copying has already been documented in Copilot. The primitive understanding Copilot has of what it is generating doesn't even approach that of the most average programmers. It's classic AI: impressive on the surface.
This isn't true.

All the "copied code" I've seen is where the person prompts it with a large amount of very unique preamble and then it fills in the exact example they are quoting from.

Try it without doing that.

And it's weird people think it can't understand conceptual relationships. Word2Vec demonstrated that nearly 10 years ago and that's a much weaker model in terms of both size and techniques than this is.

> And it's weird people think it can't understand conceptual relationships. Word2Vec demonstrated that nearly 10 years ago and that's a much weaker model in terms of both size and techniques than this is.

Saying that Word2Vec or Copilot have "understanding" of their input requires a redefinition of the word "understanding".

What's your definition?