Hacker News new | ask | show | jobs
by vbarrielle 104 days ago
Not quite in my opinion. The output of an LLM from a simple prompt falls into the public domain, but if you also give a copyrighted work as input, the mechanistic transformation performed will not alter the original license (same as encoding a video does not change its license).
2 comments

Are training data counted as input?

It would be interesting to see a court ruling that the output of LLMs trained on copyleft code are licensed under the GPL ... and all other viral licenses simultaneously

> Are training data counted as input?

It is quantum legality, to use copyright input is legal or illegal depending on the observer.

Schrodinger's Chat
Unless your llm works by quoting large parts of copyrighted works, reinterpretations of them aren't copyrighted. Because it's not a copy.
What if the output regurgitates some other legal entity’s boilerplate licence agreement? Is the output automatically licensed to that entity?
No, the copyright is the colour of the bits, and red bits with a comment saying "these bits are blue" are not blue bits, but you may be prosecuted for fraud.
It's wild to me that there haven't been more court cases to answer questions like those being asked in this thread.

No one knows.

It's new, fast-moving technology, and the courts are slow and expensive.

It would take two stubborn businesses with a lot of money deciding that it is better to battle it out than focus on their business. Something like IBM v SCO or Oracle v Google.

But we also know from other research that LLMs don't actually do mechanistic translations. Even when they are asked to and say that they did, they're basically rewriting the code from their training data