| Well, I'd be happy to have someone to chat with and exchange ideas about it. I am currently digging that rabbit hole that seems to be basically uncharted waters. I would like to find a way to make true open source deep learning models. Debian legal newsletter [1] and lwn[2] have interesting takes on the relevance of GPL. To them, putting a trained model under the GPL implicates that you have to open your dataset too, which are the "sources". That seems somehow consensual but I still think it is debatable and could need clarification. I also dug around the question whether a trained model can actually be copyrightable if the training code and the dataset are free. This is akin to a "compilation" operation that adds no creative input (anyway applying copyright to source code is already a bit of a hack). There is a pretty strong ground to argue that they are similar to "compilation of facts" which come with very little protection. I am now wondering if open source can actually work for deep learning: if trained models are not copyrightable, open source licenses require strong copyright protection to be implemented. Maybe a DL model is not protected enough for that. Finally, I am reassured by recent fair use rulings that a model will probably not be considered a derived work of its dataset and that proprietary data can legally be used to produce an unencumbered model but the legal uncertainty still exists. If you are interested in helping me trying to figure out how to protect crucial models so that the first AGI will be beneficial to all and open sourced, I'd be very happy to have someone poke holes into my ideas. [1] https://lists.debian.org/debian-legal/2009/05/msg00028.html
[2] https://lwn.net/Articles/760142/ |
The products of compilation seem to be copyrightable, otherwise software piracy wouldn't be prosecutable. Perhaps the same would apply to trained models.
Do you have a link to those fair use rulings? Also note that fair use is an American concept and doesn't apply in many countries, some of which have similar but more restricted concepts. Also, I wouldn't consider a model produced under your example as a free model, that would be more of a ToxicCandy model in the Debian ML Policy parlance.