Hacker News new | ask | show | jobs
by immibis 1073 days ago
If anyone has a copyright claim to an LLM, the creators of the input data have more of a copyright claim than the company that trained it. There's a good chance they are not copyrightable at all. I'd bet there's a lot of people willing to take on that risk.

However, they might still fall under trade secret law.

1 comments

Why would an LLM be any less copyrightable than any other piece of software?
The "software" part of an LLM is pretty trivial -- the interesting piece is the the weights. Since the weights are mechanically generated by a computer, it can be argued that the weights are not copyrightable, just like a photograph taken by a monkey isn't copyrightable.
The software is the matrix multiplication and gradient descent. We are talking about the numbers in the matrices. They are the output of a training algorithm, so we can only talk about the copyright on the training algorithm, and on its input data.
The model weights could be seen as a derived work, for which they didn't get the permission of the original copyright holders. Alternatively, it can be argued that the LLMs are no different than a fanfic writer trying to imitate the style of their favor author.

It's not obvious which way it will go, but I can see the point of those arguing that LLM data are ill-gotten gains.

For the same reason that phone books cannot have copyright.