Something like free software is needed here. Something that copyright holders can apply and that affects models that are trained on it. GPL v4 perhaps?
At least in the US, the Supreme Court has decided in the past that shrinkwrap licenses can be used to put restrictions on works that copyright doesn't apply to (https://en.wikipedia.org/wiki/ProCD,_Inc._v._Zeidenberg), so I wouldn't be surprised if we start seeing clickwrap "you agree not to train AI on this page without the author's explicit permission" licenses.
And if copyright applies, wouldnt it imply that someone learning something from a book could also then be controlled by the licensing of said book on how their gained knowledge could be utilized in the future?