| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 1155 days ago
	How about rewording a code snippet so it doesn't exactly replicate the source, but is functionally identical? Could be applied before training. Can we say the LLM only learned the ideas not the expression? Copyright should protect expression and not restrict reusing ideas.

2 comments

janoc 1155 days ago

Except that's not how LLM works. LLM has no idea about "ideas", only probabilities of how certain words string together.

So you literally can't make it produce functionally identical but not verbatim identical code. It doesn't understand that the two are equivalent.

Also, such "functionally identical but not violating copyright" transformation is not possible to do, both given the complexity of the problem and the sheer volume of the data.

And training it on some simplistically obfuscated code wouldn't help - all it would learn would be production of obfuscated code. Not useful for the intended use.

link

chii 1154 days ago

> It doesn't understand that the two are equivalent.

it doesn't need to understand the way a human might do the understanding.

The pattern that the LLM managed to extract could include the structure, rather than the pure text. And in reproducing the structure, the LLM can replace the variable names but keep the structure intact.

I am not sure if copilot is able to do this, but chatGPT was somewhat able to (if imperfectly at the moment).

link

belorn 1154 days ago

Copying a piece of code and changing the variable names is still a copy. It is similar to how copying a piece of music and changing the pitch/volume/any other attribute would still be a copy of the original music.

The thing that the LLM need to do is to convince a judge/jury that it has not created a copy, and that it operate differently from a transformation.

link

nextaccountic 1154 days ago

> So you literally can't make it produce functionally identical but not verbatim identical code. It doesn't understand that the two are equivalent.

But it does - similar but not identical code are closer in the embedding space

link

NoZebra120vClip 1155 days ago

> Copyright should protect expression and not restrict reusing ideas.

That's what patents are for.

link