Hacker News new | ask | show | jobs
by maximusdrex 1464 days ago
Not necessarily, the issue here isn't that training on public code should be illegal, it's merely that the models trained on such data should be considered derivative works under the licenses which the code is released under. GPL code must also be released under GPL, so Copilot must not charge users to use a model that is trained on and regurgitates GPL code.

> We should be striving to make copyright less draconian, not more. Agreed, and I'm not sure why you think forcing GPT and Copilot to respect open licenses will make them illegal instead of more open.

2 comments

> Agreed, and I'm not sure why you think forcing GPT and Copilot to respect open licenses will make them illegal instead of more open.

I wasn't talking about Copilot. I was talking about the vast majority of other interesting models. It wouldn't make Copilot itself illegal because it was trained on explicitly licensed data (so it'd make it GPL-licensed), but it would make those other models illegal.

Take for example the GPT-J, which was trained on 825GB of data scraped off the Internet. If we assume the view that a machine learning model is a derivative work of its training data then that makes GPT-J illegal, because it was trained on a bunch of "all rights reserved" data, and there's no legal license under which it could be released. Most interesting models are like that.

> so Copilot must not charge users

That's not what the GPL says.

> Copilot must not charge users to use a model that is trained on and regurgitates GPL code.

I was with you until then. Charging for GPL code is perfectly within the licence as long as you make the source available.

Yea that's my mistake, but not really the point I was trying to make.