|
|
|
|
|
by zelphirkalt
403 days ago
|
|
Well, you don't get to pick and choose in which situations an LLM is considered similar to a human being and in which not. If you argue that it similarly to a human is lossy, well let's go ahead and get most output checked by organizations and courts for violations of the law and licenses, just like human work is. Oh wait, I forgot, LLMs are run by companies with too much cash to successfully sue them. I guess we just have to live with it then, what a pity. |
|
Another way would be to train an internal model directly on published works, use that model to generate a corpus of sanitary rewritten/reformatted data about the works still under copyright, then use the sanitized corpus to train a final model. For example, the sanitized corpus might describe the Harry Potter books in minute detail but not contain a single sentence taken from the originals. Models trained that way wouldn't be able to reproduce excerpts from Harry Potter books even if the models were distributed as open weights.