|
|
|
|
|
by jedbrown
1089 days ago
|
|
Even MIT licensed code requires you to preserve the copyright and permission notice. If a human did what these language models are doing (output derivative works with the copyright and license stripped), it would be a license violation. When humans want to create a new implementation with clean IP, they have one team study the IP-encumbered code and write a spec, then a different team writes a new implementation according to the spec. LM developers could have similar practices, with separately-trained components that create an auditable intermediate representation and independently create new code based on that representation. The tech isn't up to that task and the LM authors think they're going to get away with laundering what would be plagiarism if a human did it. |
|
... and then execute copyrighted code -> trace resulting values -> tests for new code.
AI could do clean room reimplementation of any code to beef up the training set. It can also make sure the new code is different from the old code at ngram-level, so even by chance it should not look the same.
Would that hold up in court? Is it copyright laundering?