| > Which makes it a database or dataset and very much protected by copyright. Not every collection of numbers is a database, and a database is not the same thing as a dataset. Databases have limited copyright-like protection in some places. Under TRIPS, that extends to only databases that are "creative by virtue of the selection or arrangement of their contents" or something along those lines. In the US they talk specifically about curation. ML models do not meet either requirement by any reasonable interpretation. > The fact that many people (myself included) routinely download and use models distributed under OSI approved licenses (Apache V2, MIT, etc.) makes that statement verifiably wrong. The "source code" of an ML model is most reasonably interpreted as including all of the training data, which are never, ever available. Now you know better. [On edit: By the way, the people creating these works had better hope they're outside copyright, because if not, each one of them is a derivative work of (at least some large and almost impossible to identify subset of) its training data, so they need licenses from all the copyright holders of that training material, which few of them have or can get.] |
However, transformativeness is a factor in whether or not there is a fair-use exception for the derivative work. And these models are highly transformative, so this is a strong argument for their fair-use.