Hacker News new | ask | show | jobs
by TeMPOraL 1136 days ago
> (Also the input was "sparse matrix transpose, cs_", so his naming convention especially included. So it is questionable if a user would get his code in this shape with a normal prompt)

This. People seem to forget that generative AIs don't just spit out copyrighted work at random, of their own accord. You have to prompt them. And if you prompt them in such a way as to strongly hint at a specific copyrighted work you have in mind, shouldn't some of the blame really go to you? After all, it's you who supplied the missing, highly specific input, that made the AI reproduce a work from the training set.

I maintain that, if we want to make comparisons between transformer models (particularly LLMs) and humans, then the AI isn't like an adult human - it's best thought of as having a mentality of a four year old kid. That is, highly trusting, very naive. It will do its best to fulfill what you ask for, because why wouldn't it? At the point of asking, you and your query are its whole world, and it wasn't trained to distrust the user.

2 comments

But this means that Microsoft is publishing a black box (Copilot) that contains GPL code.

If we think of Copilot as a (de)compression algorithm plus the compressed blob that the algorithm uses as its database, the algorithm is fine but the contents of the database pretty clearly violate GPL.

While I do believe that thinking and compression will turn out to be fundamentally the same thing, the split you propose is unclear with NN-based models. Code and data are fundamentally the same thing. The distinction we usually make between them is just a simplification, that's mostly useful but sometimes misleading. Transformer models are one of those cases where the distinction clearly doesn't make any sense.
>And if you prompt them in such a way as to strongly hint at a specific copyrighted work you have in mind, shouldn't some of the blame really go to you?

If you, not I, uploaded my GPL'ed code to Github is the blame on you then?

> If you, not I, uploaded my GPL'ed code to Github is the blame on you then?

Definitely not me - if your code is GPL'ed, then I'm legally free to upload it to Github, and to an extent even ethically - I am exercising one of my software freedoms.

(Note that even TFA recognizes this and admits it's making an ethical plea, not a legal one.)

Github using that code to train Copilot is potentially questionable. Github distributing Copilot (or access to it) is a contested issue. Copilot spitting out significant parts of GPL-ed code without attaching the license, or otherwise meeting the license conditions, is a potential problem. You incorporating that code into software you distribute is a clear-cut GPL violation.

The GitHub terms of service state that you must give certain rights to your code. If you didn't have those rights, but they use them anyway, whose fault is that?