| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by baby 1463 days ago
	> Also feels kind of icky to train on open source projects and then charge for the output. "open source is great, except when it's used in a way I don't like"

3 comments

karl42 1463 days ago

I don't see the use itself as a problem, but rather that the result is not treated as a derivative work of the input. If I train it on GPL code, the result should be GPL, too.

link

natefinch 1463 days ago

This is kind of like saying that any programmer who has ever learned something from reading GPL code can only use that knowledge when writing GPL code. It's not literally copying the code. The training set isn't stored on disk and regurgitated.

Also - there is logic in copilot that checks to make sure it is not suggesting exact duplicates of code from its training set, and if it does, it never sends them to the user.

link

hnbad 1463 days ago

But Copilot is not a programmer, Copilot is a program. Slapping the "ML" label on a program doesn't magically abdicate its programmers of all responsibility as much as tech companies over the past decade have tried to convince people otherwise.

link

mplanchard 1463 days ago

I really dislike this false equivalence between human learning and machine learning. The two are significantly distinct in almost every way, both in their process and in their output. The scale is also vastly different. No human could possibly ingest all of the open source code on GitHub, much less regurgitate millions of snippets from what they “studied.”

link

thethirdone 1463 days ago

> This is kind of like saying that any programmer who has ever learned something from reading GPL code can only use that knowledge when writing GPL code. It's not literally copying the code. The training set isn't stored on disk and regurgitated.

I wouldn't put any hard rules on it, but it does seem very fair for programmers who have learned a lot from GPL code to contribute back to GPL projects. I have learned from and used a lot of open source software so whenever possible I try to make projects available to learn from or use.

link

eat_lemons 1462 days ago

Read up clean room design and on the IBM bios lawsuits from the 80's and 90's just seeing proprietary code can be a violation

Why is it different if we slap a "ml" lable on it

link

spullara 1463 days ago

I guess if you trained on GPL code that should be true for your code as well.

link

6gvONxR4sf7o 1463 days ago

It would be great if that were the case, but unfortunately it isn’t. We’ll need new laws for that.

link

matthewmacleod 1463 days ago

Yes. It is completely valid, understandable, and reasonable to have a variety of different feelings and views about how specific code and specific licenses are used.

This is particularly the case when we see the emergence of new technologies that use it in different ways. Different people may have a wide variety of equally valid views about how it is incorporated into that system.

There's nothing inconsistent, confusing, or complex about those views.

link

presentation 1463 days ago

I think the issue is not that it’s trained on open source code but that it’s trained on code whose licenses may not permit it. If you license your project in a permissive way then I don’t see a problem.

link

remram 1463 days ago

Most "permissive" licenses still require attribution.

link

baumandm 1463 days ago

Are there actually any licenses which do not permit training an AI model on the code?

link

deathanatos 1463 days ago

(IANAL) It's a tool, transforming source code. The result thus seems like a derivative work; whether you are or are not allowed to use that in your work depends on the originating license. (And perhaps, your license. E.g., you can't derive from a GPL project and license it as MIT, as the GPL doesn't permit that. But to license as GPL would be fine. But this minimal example assumes all the input to Copilot was GPL, which I rather doubt is true, and I don't think we even know what the input was.)

I think there might be some in this thread who don't consider these derivatives, for whatever reason, but it seems to be that if rangeCheck() passes de minimis, then the output from Copilot almost certainly does, too. That a tool is doing the copying and mutating, as opposed to a human, seems immaterial to it all. (Now, I don't know that I agree with rangeCheck() not being de minimis … and yet.) Or they think that Copilot is "thinking", which, ha, no.

link

xenomachina 1463 days ago

Open source licenses aren't a free-for-all. Many have terms like GPL's copyleft/share-alike or the attribution requirements of many other licenses. If copilot was trained on such code, then it seems that it, and/or the code it generates, violates those licenses.

link