| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hutzlibu 1180 days ago

"The problem is not that Copilot produces code that is "inspired" by GPL code, it's that it spits out GPL code verbatim."

But only snippets as far as I can tell.

This is the codeexample linked from the author:

https://web.archive.org/web/20221017081115/https://nitter.ne...

It is still not trivial code, but are there really lot's of different ways on how to transpose matrixes?

(Also the input was "sparse matrix transpose, cs_", so his naming convention especially included. So it is questionable if a user would get his code in this shape with a normal prompt)

And just slightly changing the code seems trivial, at what point will it be acceptable?

I just don't think spending much energy there is really beneficial for anyone.

I rather see the potential benefits of AI for open source. I haven't used Copilot, but ChatGPT4 is really helpful generating small chunks of code for me, enabling me to aim higher in my goals. So what's the big harm, if also some proprietary black box gets improved, when also all the open source devs can produce with greater efficency?

3 comments

TeMPOraL 1180 days ago

> (Also the input was "sparse matrix transpose, cs_", so his naming convention especially included. So it is questionable if a user would get his code in this shape with a normal prompt)

This. People seem to forget that generative AIs don't just spit out copyrighted work at random, of their own accord. You have to prompt them. And if you prompt them in such a way as to strongly hint at a specific copyrighted work you have in mind, shouldn't some of the blame really go to you? After all, it's you who supplied the missing, highly specific input, that made the AI reproduce a work from the training set.

I maintain that, if we want to make comparisons between transformer models (particularly LLMs) and humans, then the AI isn't like an adult human - it's best thought of as having a mentality of a four year old kid. That is, highly trusting, very naive. It will do its best to fulfill what you ask for, because why wouldn't it? At the point of asking, you and your query are its whole world, and it wasn't trained to distrust the user.

link

snovv_crash 1180 days ago

But this means that Microsoft is publishing a black box (Copilot) that contains GPL code.

If we think of Copilot as a (de)compression algorithm plus the compressed blob that the algorithm uses as its database, the algorithm is fine but the contents of the database pretty clearly violate GPL.

link

TeMPOraL 1180 days ago

While I do believe that thinking and compression will turn out to be fundamentally the same thing, the split you propose is unclear with NN-based models. Code and data are fundamentally the same thing. The distinction we usually make between them is just a simplification, that's mostly useful but sometimes misleading. Transformer models are one of those cases where the distinction clearly doesn't make any sense.

link

Dah00n 1180 days ago

>And if you prompt them in such a way as to strongly hint at a specific copyrighted work you have in mind, shouldn't some of the blame really go to you?

If you, not I, uploaded my GPL'ed code to Github is the blame on you then?

link

TeMPOraL 1180 days ago

> If you, not I, uploaded my GPL'ed code to Github is the blame on you then?

Definitely not me - if your code is GPL'ed, then I'm legally free to upload it to Github, and to an extent even ethically - I am exercising one of my software freedoms.

(Note that even TFA recognizes this and admits it's making an ethical plea, not a legal one.)

Github using that code to train Copilot is potentially questionable. Github distributing Copilot (or access to it) is a contested issue. Copilot spitting out significant parts of GPL-ed code without attaching the license, or otherwise meeting the license conditions, is a potential problem. You incorporating that code into software you distribute is a clear-cut GPL violation.

link

xigoi 1179 days ago

The GitHub terms of service state that you must give certain rights to your code. If you didn't have those rights, but they use them anyway, whose fault is that?

link

Dah00n 1180 days ago

>And just slightly changing the code seems trivial, at what point will it be acceptable?

If I start creating a car by using a blueprint of Fords to create something at what point will it be acceptable? I'd say even if you rework everything completely Ford would still have a case to sue you. I can't see how this is any different. My code is my code and no matter how much you change it, it is still under the same licence as it started out with. If you want it not to be then don't start with a part of my code as a base. In my opinion the case is pretty clear: This is only going on because Microsoft has lots of money and lawyers. A small company doing this would be crushed.

link

hanselot 1180 days ago

Easy. People get to throw rocks at the shiny new thing. To my untrained eye the entire idea of copyrighting a piece of text is ridiculous. Let me phrase it in an entirely different way from how any other person seems to be approaching it.

If a medical procedure is proven to be life-saving, what happens worldwide? Doctors are forced to update their procedures and knowledge base to include the new information, and can get sued for doing something less efficient or more dangerous, by comparison.

If you write the most efficient code, and then simply slap a license on it, does that mean, the most efficient code is now unusable by those who do not wish to submit to your licensing requirements?

I hear an awful lot of people complain all the time about climate change and how bad computers are for the environment, there are even sections on AI model cards devoted to proving how much greenhouse gases have been pushed into the environment, yet none of those virtue signalling idiots are anywhere to be seen when you ask them why they aren't attacking the bureaucracy of copyright and law in the world of computer science.

An arbitrary example that is tangentially related: One could argue that the company sitting on the largest database of self-driving data for public roads is also the one that must be held responsible if other companies require access to such data for safety reasons (aka, human lives would be endangered as a consequence of not having access to all relevant data). See how this same argument can easily be made for any license sitting on top of performance critical code?

So where are these people advocating for climate activism and whatever, when this issue of copyright comes up? Certainly if OpenAI was forced to open source their models, substantial computing resources would not have been wasted training competing open source products, thus killing the planet some more.

So, please forgive me if I find the entire field to be redundant and largely harmful for human life all over.

link

7jjjjjjj 1180 days ago

Yes, of course copyright is dumb and we'd all be better off without it. Duh.

The problem here is that Microsoft is effectively saying, "copyright for me but not for thee." As long as Microsoft gets a state-enforced monopoly on their code, I should get one too.

link

rekado 1180 days ago

> If you write the most efficient code, and then simply slap a license on it, does that mean, the most efficient code is now unusable by those who do not wish to submit to your licensing requirements?

If you don't "slap a license on it" it is unusable by default due to copyright.

link