| > While this likely has no legal weight I wouldn't be quite so sure about that. The AI industry has entirely relied on 'move fast and break things' and 'old fart judges who don't understand the tech' as their legal strategy. The idea that AI training is fair use isn't so obvious, and quite frankly is entirely ridiculous in a world where AI companies pay for the data. If it's not fair use to take reddit's data, it's not fair use to take mine either. On a technological level the difference to prior ML is straightforward: A classical classifier system is simply incapable of emitting any copyrighted work it was trained on. The very architecture of the system guarantees it to produce new information derived from the training data rather than the training data itself. LLMs and similar generative AI do not have that safeguard. To be practically useful they have to be capable of emitting facts from training data, but have no architectural mechanism to separate facts from expressions. For them to be capable of emitting facts they must also be capable of emitting expressions, and thus, copyright violation. Add in how GenAI tends to directly compete with the market of the works used as training data in ways that prior "fair use" systems did not and things become sketchy quickly. Every major AI company knows this, as they have rushed to implement copyright filtering systems once people started pointing out instances of copyrighted expressions being reproduced by AI systems. (There are technical reasons why this isn't a very good solution to curtail copyright infringement by AI) Observe how all the major copyright victories amount to judges dismissing cases on grounds of "Well you don't have an example specific to your work" rather than addressing whether such uses are acceptable as a collective whole. |
If this intended to refer to Judge Alsup, it is extremely wrong.