Hacker News new | ask | show | jobs
by jongjong 21 days ago
But in this case, a human has awareness of what software they are copying or modifying and that's how the original software author receives credit. The contract requires some degree of human awareness to be valid. This is the critical difference.
1 comments

Sorry that's nonsense. There's human awareness when ingesting MIT code into an LLM too. In both cases it's a human that says $ excute-global-replace or $ ingest-into-llm

Both operations require some degree of human awareness. What you appear to be saying is, a human can only use a limited algorithm to access this source code, not a sophisticated one. And where do you draw that line? Who should get to say what is too sophisticated?

Error: your algorithm is too sophisticated to proceed, please provide more human awareness, it's a critical difference.

If your LLM were to hack into Microsoft and steal the source code from an important project and inject it into your project without you being aware of it; wouldn't that make you liable if you then published it?

Unfortunately there is no way to agree to a license of a software you're using if you didn't read the license or if you're not even aware that you're using the licence. This is what's happening at the training stage.

If you say that awareness doesn't matter then it means you cannot stop AI from stealing any IP open source or not.

I think the main issue with LLMs is that there is no mechanism to stop them from stealing. Thus they are guaranteed to infringe on copyright to some extent.

Also, beyond copying and copyright, there is another problem that LLMs are also infecting the logic and expertise built into the project. This is a completely novel mechanism and needs to be treated as separate under the law. Else it would be the end of all IP.

> I think the main issue with LLMs is that there is no mechanism to stop them from stealing.

Well, sure there is—for the people running them.

If you're building training data for an LLM, you only use data that a) is firmly in the public domain, or b) you have a clear and documented legal right to use.