Hacker News new | ask | show | jobs
by visarga 1089 days ago
Why can't AI do the same: copyrighted code -> spec -> generated code.

... and then execute copyrighted code -> trace resulting values -> tests for new code.

AI could do clean room reimplementation of any code to beef up the training set. It can also make sure the new code is different from the old code at ngram-level, so even by chance it should not look the same.

Would that hold up in court? Is it copyright laundering?

2 comments

Language models don't understand anything, they just manipulate tokens. It is a much harder task to write a spec (that humans and courts can review if needed to determine is not infringement) and (with a separately trained tool) implement the spec. The tech just isn't ready and it's not clear that language models will ever get there.

What language models could do easily is to obfuscate better so the license violation is harder to prove. That's behavior laundering -- no amount of human obfuscation (e.g., synonym substitution, renaming variables, swapping out control structures) can turn a plagiarized work into one that isn't. If we (via regulators and courts) let the Altmans of the world pull their stunt, they're going to end up with a government-protected monopoly on plagiarism-laundering.

Isn’t the language model itself the spec?

Potentially for all of the inputs at once.