Sure, but what matters for copyright is output, not input. For now.
If we make the (poor, imo) decision to prevent training on copyrighted data, that's a restriction on the training process, not on its result.
And in the world where we're making bad decisions to put legal restrictions on the training process, "can't train on data obtained by models that were trained without these restrictions" seems on the table.
If we make the (poor, imo) decision to prevent training on copyrighted data, that's a restriction on the training process, not on its result.
And in the world where we're making bad decisions to put legal restrictions on the training process, "can't train on data obtained by models that were trained without these restrictions" seems on the table.