|
"The difference, when it comes to AI, is one of scale. ChatGPT can “read” more published words in a few seconds than I could in several lifetimes and, unlike me, that data isn’t immediately replaced in my human-limited short-term memory by whatever I’m thinking of next." I think this misses the point. The issue of scale isn't on the ingest side, it's on the output side. Once you train an LLM on a book (however long that takes), then the LLM can be the interface to that book for an unlimited number of users. That scales very differently to, say, a person reading a book and writing something influenced by it. In the case of the LLM, it's a complete interface to the contents of the book. It lets you "talk to the book". If that exists, why would anyone buy the book? If I could ask ChatGPT to "summarize the new book by XYZ", then spend an hour or two asking the questions _I_ have about the book from it, then buying the book would be a net negative. If we don't solve attribution (like BMI solved for music), then the financial upside of publishing might be majority-captured by whoever trains LLMs on the copyrighted material. |
Or more precisely, they should be made illegal if and only if they achieve "scale" of maybe at least a couple million viewers.
The fundamental premise of copyright is flawed. Taking medieval concepts involving censorship of the printing press and extending them to the 21st century is bound to produce awkward results. I'm not hopeful that copyrights will be reconsidered from the ground up during this AI shock, but at least we shouldn't pretend that any arguments about copyrights should be reasonable and make sense. I honestly believe a "realpolitik" approach is more helpful, at least we know that those with more political influence and spend more effort lobbying will probably "win" in the end...