|
|
|
|
|
by visarga
1155 days ago
|
|
How about rewording a code snippet so it doesn't exactly replicate the source, but is functionally identical? Could be applied before training. Can we say the LLM only learned the ideas not the expression? Copyright should protect expression and not restrict reusing ideas. |
|
So you literally can't make it produce functionally identical but not verbatim identical code. It doesn't understand that the two are equivalent.
Also, such "functionally identical but not violating copyright" transformation is not possible to do, both given the complexity of the problem and the sheer volume of the data.
And training it on some simplistically obfuscated code wouldn't help - all it would learn would be production of obfuscated code. Not useful for the intended use.