| "training data is totally a violation of copyright" This really isn't clear because cognition is treated as a special exception to copyright. Every thought we have is derivative of everything we've seen before to some degree; reading a book makes our brains a derivative work. But we recognize that cognition is special. With machines we tend to apply a strict test: Did copyright go in? If so, the output is almost certainly derivative. With human brains, with cognition, it isn't enough to prove that a person has consumed a copywitten work prior to having a thought -- instead we judge every thought individually as to its originality. If we are in a position to apply similar cognitive rules to an LLM then the weights won't be derivative works and we will judge each output as to its originality rather than simply assume. |
Actually, no. It's considered a transformative use. If you memorize a copyrighted play or piece of music and then perform in in public, that's a copyright violation. It's the literalness of the copy that matters.