Hacker News new | ask | show | jobs
by jongjong 21 days ago
If your LLM were to hack into Microsoft and steal the source code from an important project and inject it into your project without you being aware of it; wouldn't that make you liable if you then published it?

Unfortunately there is no way to agree to a license of a software you're using if you didn't read the license or if you're not even aware that you're using the licence. This is what's happening at the training stage.

If you say that awareness doesn't matter then it means you cannot stop AI from stealing any IP open source or not.

I think the main issue with LLMs is that there is no mechanism to stop them from stealing. Thus they are guaranteed to infringe on copyright to some extent.

Also, beyond copying and copyright, there is another problem that LLMs are also infecting the logic and expertise built into the project. This is a completely novel mechanism and needs to be treated as separate under the law. Else it would be the end of all IP.

1 comments

> I think the main issue with LLMs is that there is no mechanism to stop them from stealing.

Well, sure there is—for the people running them.

If you're building training data for an LLM, you only use data that a) is firmly in the public domain, or b) you have a clear and documented legal right to use.