|
|
|
|
|
by kube-system
689 days ago
|
|
> mechanically applying that software to datasets that are (a) assembled with minimal, if any creativity, and (b) definitely not assembled with any eye to the specific form of the resulting model. Fair enough, but those datasets are also primarily copyrighted material. If the software here merely transforms the input material (which I agree it does), then the output is a derivative work. |
|
If I take a string of data from a true hardware RNG, XOR it with a Taylor Swift song, and throw away the original random stream, is the resulting fundamentally random bit string still a derivative work of the song? As with the ML model, you can't recognize the song in it. And as with at least some training examples in the inputs of most ML models, you can't recover the song from it either.
It feels like the test for whether X is derivative for copyright purposes should include some kind of attention to whether X is a creative work at all. Maybe not, but then what test do you use?
I do recognize the possibility that the models might not themselves be eligible for copyright as independent works, yet still infringe copyright in the training inputs. It seems messy, but not impossible.
... and as I said elsewhere, it's also messy that while you generally can't recover every training input from the model, you can usually recover something very close to some of the training inputs.