Because, unlike humans, LLMs reliably reproduce exact excerpts from their training data. It's very easy to get image generation models to spit out screenshots from movies.
> Can we start at "humans are not computers", maybe?
Sure. So it stands to reason that "computers" are not bound by human laws. So an LLM that finds a piece of copyright data out there on the internet, downloads it, and republishes it has not broken any law? It certainly can't be prosecuted.
My original point was that copyright protections are about (amongst other things) protecting distribution and derivative works rights. I'm not seeing a coherent argument that feeding a copyrighted work (that you obtained legally) into a machine is breaching anyone's copyright.
> So an LLM that finds a piece of copyright data out there on the internet, downloads it, and republishes it has not broken any law?
Are you even trying? A gun that kills a person has not broken any law? It certainly can't be prosecuted.
> I'm not seeing a coherent argument that feeding a copyrighted work (that you obtained legally) into a machine is breaching anyone's copyright.
So you don't see how having an automated blackbox that takes copyrighted material as an input and provides a competing alternative that can't be proven to come from the input goes against the idea of copyright protections?
> So you don't see how having an automated blackbox that takes copyrighted material as an input and provides a competing alternative that can't be proven to come from the input goes against the idea of copyright protections?
Semantically, this is the same as a human reading all of Tom Clancy and then writing a fast-paced action/war/tension novel.
Is that in breach of copyright?
Copyright protects the expression of an idea. Not the idea.
> I agree with the fact that LLMs are big open-source laundering machines, and that is a problem.
Why do you believe this is a problem? I mean, to believe that you first need to believe that having access to the source code is somehow a problem.
> I mostly see it as a problem for copyleft licences.
Nonsense.
At most, the problem lies in people ignoring what rights a FLOSS license grants to end users, and then feigning surprise when end users use their software just as the FLOSS license intended.
Also a telltale sign is the fact that these blind criticisms single out very precise corporations. Apparently they have absolutely no issue if any other cloud provider sells managed services. They single out AWS but completely ignore the fact that the organization behind ValKey includes the likes of Google, Ericsson, and even Oracle of all things. Somehow only AWS is the problem.
> I mean, to believe that you first need to believe that having access to the source code is somehow a problem.
How in the world did you get there from what I said? Open source code has a licence that says what the copyright owner allows or not. LLMs are laundering machine in the sense that they allow anybody to just ignore licences and copyright in all code (even proprietary code: if you manage to train on the code of Windows without getting caught, you're good).
> At most, the problem lies in people ignoring what rights a FLOSS license grants to end users
Once it's been used to train an LLM, there is no right anymore. The licence, copyright, all that is worthless.
> Also a telltale sign is the fact that these blind criticisms [...]
> LLMs are laundering machine in the sense that they allow anybody to just ignore licences and copyright in all code (...)
No. Having access to the code does that. You only need a single determined engineer to do that. I mean, do you believe that until the inception of LLMs the world was completely unaware of the whole concept of reverse engineering stuff?
> Once it's been used to train an LLM, there is no right anymore.
Nonsense. You do not lose your rights to your work just because someone used a glorified template engine to write something similar. In fact, your whole blend of comment conveys a complete lack of experience using LLMs in coding applications, because all major assistant coding services do enforce copyright filters even when asking questions.
> do you believe that until the inception of LLMs the world was completely unaware of the whole concept of reverse engineering stuff?
The scale makes all the difference! A single determined engineer, in their whole life, cannot remotely read all the code that goes into the training phase. How in the world can you believe it is the same thing?
> Nonsense. You do not lose your rights to your work just because [...]
It is only nonsense if you don't try to understand what I'm saying. What I am saying is that if it is impossible to prove that the LLM was trained with copyrighted material, then the copyright doesn't matter.
But maybe your single determined engineer can reverse engineer any trained LLM and extract the copyright code that was used in the training?