|
|
|
|
|
by johnnyanmac
497 days ago
|
|
>But the problem is that the current method for training requires this volume of data. So the models are legitimately not viable without massive copyright infringement. Sure it is. It just requires what every other copyright'd work needs: permission and stipulations from the copyright holder. These aren't small time bloggers on the internet, these are large scale businesses. >Though big-picture, it seems to me that the money-ed interests will ensure that even if the current legal landscape doesn't allow LLM's to exist, then they will lobby HARD until it is allowed. The only solace I take is that these conglomerates are paying a lot to take down the rules they made 30 years ago when they weren't the ones profiting from stealing. But yes, I'm still frustrated by the hypocrisy. |
|
Most other scenarios don't use millions/billions of works - that's the part which puts viability in question.
> these are large scale businesses.
I'd like training models to also remain accessible to open-source developers, academic researchers, and smaller businesses. Large-scale pretraining is common even for models that are not cutting-edge LLMs.
> The only solace I take is that these conglomerates are paying a lot to take down the rules they made 30 years ago when they weren't the ones profiting from stealing
As far as I'm aware, most of the lobbying in favor of stricter copyright has been done by Disney, Universal, Time Warner, RIAA, etc.
Not to say that tech companies have a consistent moral stance beyond whatever's currently in their financial self-interest, but I think that self-interest has put them in a position of supporting fair use and copyright safe harbors, opposing link tax, etc. more often than the the other way around - with cases like Authors Guild v. Google being a significant win for fair use.