| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by palata 455 days ago
	I agree with the fact that LLMs are big open-source laundering machines, and that is a problem. I mostly see it as a problem for copyleft licences. Permissive don't protect the users in the first place, so...

2 comments

pbronez 455 days ago

So who’s gonna sue an AI company asserting that all code they produce is GPL due to being trained on GPL code?

link

growse 455 days ago

How is training a model on GPL code and then having it write code any different to having a human read GPL code and then write code?

Unless there's a specific copyright claim over a specific piece of code that was copied and published, it's hard to see how the GPL has any relevance.

link

bccdee 455 days ago

Because, unlike humans, LLMs reliably reproduce exact excerpts from their training data. It's very easy to get image generation models to spit out screenshots from movies.

link

growse 454 days ago

That doesn't mean that all of the output from an LLM trained on GPL code is a derivative work (and therefore GPL'd too).

link

bccdee 452 days ago

A model that provably engages in systematic, difficult-to-detect plagiarism must itself be considered plagiaristic.

link

palata 455 days ago

I see that argument over and over, and I don't understand how people can consider it makes sense.

"My clipboard learned the code, just like a human would. So it should be fine to copy-paste anything and call it my own".

"How is killing a human any different to killing a computer?"

"If humans can vote, why couldn't computers vote as well?"

Can we start at "humans are not computers", maybe?

link

growse 455 days ago

> Can we start at "humans are not computers", maybe?

Sure. So it stands to reason that "computers" are not bound by human laws. So an LLM that finds a piece of copyright data out there on the internet, downloads it, and republishes it has not broken any law? It certainly can't be prosecuted.

My original point was that copyright protections are about (amongst other things) protecting distribution and derivative works rights. I'm not seeing a coherent argument that feeding a copyrighted work (that you obtained legally) into a machine is breaching anyone's copyright.

link

palata 454 days ago

> So an LLM that finds a piece of copyright data out there on the internet, downloads it, and republishes it has not broken any law?

Are you even trying? A gun that kills a person has not broken any law? It certainly can't be prosecuted.

> I'm not seeing a coherent argument that feeding a copyrighted work (that you obtained legally) into a machine is breaching anyone's copyright.

So you don't see how having an automated blackbox that takes copyrighted material as an input and provides a competing alternative that can't be proven to come from the input goes against the idea of copyright protections?

link

growse 454 days ago

> So you don't see how having an automated blackbox that takes copyrighted material as an input and provides a competing alternative that can't be proven to come from the input goes against the idea of copyright protections?

Semantically, this is the same as a human reading all of Tom Clancy and then writing a fast-paced action/war/tension novel.

Is that in breach of copyright?

link

rank0 454 days ago

> A gun that kills a person has not broken any law? It certainly can't be prosecuted.

Yeah dude…its an inanimate object.

link

palata 455 days ago

I feel like nobody cares. It sucks, I know. Like climate change, biodiversity loss, the energy crisis.

Feels like we're pretty much screwed. Doesn't mean it's not a problem.

link

motorest 455 days ago

> I agree with the fact that LLMs are big open-source laundering machines, and that is a problem.

Why do you believe this is a problem? I mean, to believe that you first need to believe that having access to the source code is somehow a problem.

> I mostly see it as a problem for copyleft licences.

Nonsense.

At most, the problem lies in people ignoring what rights a FLOSS license grants to end users, and then feigning surprise when end users use their software just as the FLOSS license intended.

Also a telltale sign is the fact that these blind criticisms single out very precise corporations. Apparently they have absolutely no issue if any other cloud provider sells managed services. They single out AWS but completely ignore the fact that the organization behind ValKey includes the likes of Google, Ericsson, and even Oracle of all things. Somehow only AWS is the problem.

link

palata 455 days ago

> I mean, to believe that you first need to believe that having access to the source code is somehow a problem.

How in the world did you get there from what I said? Open source code has a licence that says what the copyright owner allows or not. LLMs are laundering machine in the sense that they allow anybody to just ignore licences and copyright in all code (even proprietary code: if you manage to train on the code of Windows without getting caught, you're good).

> At most, the problem lies in people ignoring what rights a FLOSS license grants to end users

Once it's been used to train an LLM, there is no right anymore. The licence, copyright, all that is worthless.

> Also a telltale sign is the fact that these blind criticisms [...]

No clue what you are talking about here.

link

motorest 455 days ago

> LLMs are laundering machine in the sense that they allow anybody to just ignore licences and copyright in all code (...)

No. Having access to the code does that. You only need a single determined engineer to do that. I mean, do you believe that until the inception of LLMs the world was completely unaware of the whole concept of reverse engineering stuff?

> Once it's been used to train an LLM, there is no right anymore.

Nonsense. You do not lose your rights to your work just because someone used a glorified template engine to write something similar. In fact, your whole blend of comment conveys a complete lack of experience using LLMs in coding applications, because all major assistant coding services do enforce copyright filters even when asking questions.

link

palata 454 days ago

> do you believe that until the inception of LLMs the world was completely unaware of the whole concept of reverse engineering stuff?

The scale makes all the difference! A single determined engineer, in their whole life, cannot remotely read all the code that goes into the training phase. How in the world can you believe it is the same thing?

> Nonsense. You do not lose your rights to your work just because [...]

It is only nonsense if you don't try to understand what I'm saying. What I am saying is that if it is impossible to prove that the LLM was trained with copyrighted material, then the copyright doesn't matter.

But maybe your single determined engineer can reverse engineer any trained LLM and extract the copyright code that was used in the training?

link