|
I am not a lawyer, but it seems right to me to say that the weights are a derivative work of the training set. > A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship, is a “derivative work”. As I understand it, derivative works must be created with the legal use of the original work, or be fair use, otherwise they are infringing. |
If you take a book and turn it into a movie, that's a derivative work. Anyone can see the direct resemblance -- the transformation or adaptation.
But if you take a book, convert each letter to a number, add up the numbers that make each sentence, and then sell that as a list of "random" numbers, that's not a derivative work. The end result is sufficiently transformed that copyright no longer applies. Ownership of the original work has no relevance.
And AI weights are like that. They're a complete transformation. They're not a derivate work. The only thing you have to make sure of is that they haven't been overtrained to the extent that they can regurgitate whole chapters of the texts they were trained on, for example. But that's not something they're currently able to do, and obviously copyright law will force companies to ensure it stays that way. (Not to mention that companies would do it anyways, due to the economic motivation of reducing model sizes to cut costs.)