Hacker News new | ask | show | jobs
by m4x 47 days ago
Your examples, as you say, are all public domain. Are all the works we train LLMs on public domain too? Was the original book in my analogy in the public domain? What do you think about training on material that isn't yet in the public domain?
1 comments

You're framing this as an ethical question, but copyright term lengths are essentially arbitrary. They're set by the government, as are the boundaries of fair use. At which point you're making a circular argument. That it's bad if it's illegal and that it should be illegal because it's bad. So what happens if someone argues the opposite? That it's not unethical if it's fair use and then it should be fair use because it's not unethical.
I'm not making a circular argument, nor one based on legality. You explicitly changed your example to use "public domain" content, and ignoring the legal specifics of that it's clear that's a separate category of content. Most people have no ethical issue with remixing or using content that has already done the rounds and delivered most of its immediate value to the creator. This is very different to your earlier examples with books, framed as two contemporary pieces of media competing with each other.

Letting companies train LLMs on the "classics" is very different to training on contemporary media where the creator still depends on it.