Hacker News new | ask | show | jobs
by jraph 1260 days ago
I would be fine with my GPL code being used as training data. As long as attribution and copyleft is kept for any derivative work created by AI.

Which might as well be anything the AI outputs. Yes, please include my copyleft work, actually :-)

(I'm afraid having different versions of the GPL would split the free software / open source community, I'd prefer copyright laws to be updated / jurisprudence to decide that it's not ok to ignore licences)

4 comments

> As long as attribution and copyleft is kept for any derivative work created by AI.

That might get your name on a list between a thousand others. And people who want to be on that list can get there easily by starting a (fake) GitHub repo.

In other words, being on the acknowledgement list stops being meaningful.

I was a bit tongue in cheek on this part of my comment, because that's the idea.

You are essentially saying that respecting the license I chose is impractical.

In other words, that my work cannot be used in such a setting.

So, either AIs need to stop being trained so liberally, or it becomes clear that copyright does not apply when used as training set. I appreciate that other people have different opinion on this, but I pretty much don't want my work to be used like this, and having a variant of licenses to cover this would be concerning.

The fight about this is not over, hence the hope I'm expressing.

It's not the acknowledgement list he's talking about, but the code that the AI spits out being GPL'd as well.
But it's also the acknowledgement list, and this part is not specific to the GPL, it also applies to BSD, MIT and all the other licenses which include an attribution clause, which is most of them.

If we require AIs to respect licenses of work in their training set, it's a mess, really. It might actually be unsolvable because licenses are not all compatible with each others.

I don’t really see why that’s a mess. There are two easy solutions I can think of right off the bat:

- Don’t train AI on work with incompatible licenses

- Don’t train AI on copyrighted work at all

IANAL, but unless explicitly disallowed, use as training data is allowed.

Given that no version of the GPL (or other OSI license) has provisions about that type of usage, they are all AI-friendly.

This is the whole question.

> Given that no version of the GPL (or other OSI license) has provisions about that type of usage, they are all AI-friendly.

With copyright laws, not explicitly granted means forbidden, unless covered/overridden by law (like fair use for instance). So this could be an argument against AI use without attribution / copyleft, since it's not explicitly granted.

Now, it remains to be seen if law / jurisprudence deems AI use exempt from copyright restrictions (or something like fair use applies).

Would the same attribution be required for code written by a human that learned from your example but did not copy it verbatim?

Do you attribute all of the code you ever learned from?

It may be because I’m a crappy programmer, but I often have SO or a search-discovered web page with someone else’s code while writing or debugging my own. Not for copy/paste, but to see how someone else used a function or solved a difficult problem that I’m stuck on.

I’ve never considered that to be a license violation. Is it?

No, unless your code looks sufficiently like the code you got inspiration from and this code is sufficiently non trivial.

For code coming from Stack Overflow, the license to follow would be CC BY-CA 4.0.

(which is kinda like GPL, and which is not the best license for code by the way)

https://stackoverflow.com/legal/terms-of-service#licensing

The license has no bearing on who or what may read code.

Licenses may control the circumstances of code facsimiles and products, but trying to stop your code from simply being seen or read with a disclaimer is impossible and silly.

I just want derivative work to respect the conditions I chose, that the copyright laws allow me to impose.

And I think training an AI is not like learning or reading for a human being.

And that's not silly. It's an opinion, but one shared with many people.

You know one of the core arguments is there. You tried to hide it and you didn't blink before using words like "simply" and "silly". That's disrespectful.

Simplicity is important and when something is silly it’s important to say so. If you feel disrespected I think you’re being too sensitive.

I think you’re confused about copyright law and about what is a facsimile vs a derivative work, and you’re also wrong about learning being fundamentally different for AI. AI learning is vastly superior and much more capable of both facsimile and imitation… but what is that if not learning; that is the definition of learning, plainly.