|
|
|
|
|
by maximusdrex
1464 days ago
|
|
Not necessarily, the issue here isn't that training on public code should be illegal, it's merely that the models trained on such data should be considered derivative works under the licenses which the code is released under. GPL code must also be released under GPL, so Copilot must not charge users to use a model that is trained on and regurgitates GPL code. > We should be striving to make copyright less draconian, not more.
Agreed, and I'm not sure why you think forcing GPT and Copilot to respect open licenses will make them illegal instead of more open. |
|
I wasn't talking about Copilot. I was talking about the vast majority of other interesting models. It wouldn't make Copilot itself illegal because it was trained on explicitly licensed data (so it'd make it GPL-licensed), but it would make those other models illegal.
Take for example the GPT-J, which was trained on 825GB of data scraped off the Internet. If we assume the view that a machine learning model is a derivative work of its training data then that makes GPT-J illegal, because it was trained on a bunch of "all rights reserved" data, and there's no legal license under which it could be released. Most interesting models are like that.
> so Copilot must not charge users
That's not what the GPL says.